Speakers - PHWC2025

Li Chin Chen

  • Designation: Data Analytics and Digital Transformation Research Center, National Taiwan University
  • Country: Taiwan
  • Title: Constructing a General Demographic Foundational Model Based on Representation Learning Technology

Abstract

Following the robust advancement of deep learning methodologies and foundational models, the field has attained significant results across a myriad of applications, proving advantageous for augmenting machine learning outcomes without dependence on artificial feature extraction. Nevertheless, there has been limited adoption of these technologies within the realm of public health. One contributing factor is that a substantial portion of public health, medical, and governmental data is predominantly presented in tabular format, which currently lacks effective pre-trained or foundational models tailored for this domain. Moreover, existing models are not designed to support public health initiatives. Public health policies are formulated based on the prevailing population dynamics and disease distribution; thus, it is imperative to leverage the extensive repository of accumulated data alongside the capabilities of deep learning methods to facilitate industry transformation and inform public health policy development. This study aims to develop representations and foundational models that are conducive to public health-related research or applications, utilizing the most prevalent demographic information—namely gender and age—through training on an extensive health insurance database, thereby bolstering support for future public health policy formulation. Three methodologies were explored: the first involves a traditional approach, encoding information through one-hot encoding and max-min scaling; the second approach encodes age information using positional embedding while integrating gender encoding; the third approach transforms tabular data into textual form and encodes the resulting sentences using a large language model (LLM). The model will initially be trained on Taiwan's National Health Insurance Database, which encompasses 98% of the population. Following the training of these foundational models, we aim to validate their generalizability using open data related to public health from OpenML. Individual tasks will be designed to assess whether the pre-trained model enhances performance on these tasks.

Don't miss our future updates!

Get in Touch