Sungjoon Park


Bio

Sungjoon Park is an AI research scientist at the Materials Intelligence Lab within LG AI Research. He completed his PhD in Computer Science and Engineering at Seoul National University in 2024. His research is centered on utilizing AI, machine learning, and deep learning technologies to reduce drug discovery costs by predicting the molecular characteristics of chemical compounds and analyzing bioinformatics data, including multi-omics and clinical health records.

In addition to his expertise in software programming, Sungjoon has experience with cloud computing infrastructure, Linux server maintenance, and full-stack web development. These experiences outside of research have equipped him with strong communication skills and an understanding of the needs of technicians, designers, promoters, and decision-makers.

In his personal life, Sungjoon enjoys reading books across various subjects, cycling, sport climbing, and playing video/board games. He values these activities for providing relaxation and personal growth, complementing his professional pursuits.

Work Experience

  • Research Scientist
    '24 - Present
    Materials Intelligence Lab, LG AI Research
  • Postdoctoral Researcher
    '24
    Bioinformatics Institute, Seoul National University

Education

Research Projects

  • AD biomarker discovery
    '24 - Present
    • Development of a phenotype prediction model for Alzheimer's disease (AD) utilizing AD-BXD mouse models and human multi-omics data

  • AI-based drug discovery
    '19 - '24
    • In silico virtual high-throughput screening & absorption, distribution, metabolism, excretion and toxicity (ADMET) prediction with machine learning and AI techniques
    • Linking a large in vivo clinical database of The Cancer Genome Atlas (TCGA) to in vitro experiments from Cancer Cell Line Encyclopedia (CCLE) using matrix factorization on the cloud system to recommend personalized medicine
    • Prediction of drug side-effect frequency by mapping drugs and side effects onto a common embedding space using deep learning and ensemble methods

  • COPD pathogenesis study using ML
    '20 - '23
    • Etiological study of environmental factors such as cigarette smoke extract and particulate matter in chronic obstructive pulmonary disease (COPD) proteome data from air-liquid interface (ALI) cultured cells
    • Cross validation with independent public data of single cell transcriptomics from Sequence Read Archive (SRA)

  • Multi-omics integrative analysis on the cloud
    '20 - '22
    • Survey on machine learning methods to investigate gene regulations by utilization of multi-omics data
    • Deploying an integrative analysis pipeline on Amazon Web Service (AWS) by combining tools for single nucleotide variations (SNVs), transcriptomics, copy number variations (CNVs), and DNA methylation

  • Homomorphic encryption for SNP arrays
    '18 - '19
    • Assay genomic sequence to find single nucleotide polymorphisms (SNPs) without sharing a private key of the encrypted patient DNA with the hospital
    • Devise of the first secure SNP panel scheme to encrypt the genomic data using an open source homomorphic encryption library (HEAAN)

Skills

AI Model design
  • Deep learning (PyTorch)
  • Machine learning
  • Genetic algorithm
  • Ensemble methods
Cloud Experience
  • Amazon Web Service (AWS)
  • Oracle Cloud Infrastructure (OCI)
Server Maintenance
  • CentOS
  • Debian/Ubuntu
Web Full-stack
  • HTML/CSS/JavaScript
  • jQuery & Bootstrap
  • nodejs
  • django
Programming Languages
  • C/C++
  • Python
  • JavaScript
  • Shell script
  • SQL

Publications

Articles

  1. in press M Pak, D Jeong, S Park, J Gu, S Lee and S Kim. "ALPACA: A Visual Data Mining System for Subcellular Location-specific Knowledge Mining from Multi-Omics Data in Cancer." Accepted to BMC Medical Genomics.
  2. S Park, S Lee, M Pak and S Kim. "Dual representation learning for predicting drug-side effect frequency using protein target information." Journal of Biomedical and Health Informatics (2024). doi:10.1109/jbhi.2024.3350083
  3. JK Yoon, S Park, KH Lee, D Jeong, J Woo, J Park, SM Yi, D Han, CG Yoo, S Kim and CH Lee. "Machine Learning-Based Proteomics Reveals Ferroptosis in COPD Patient-Derived Airway Epithelial Cells Upon Smoking Exposure." Journal of Korean Medical Science 38.29 (2023). doi:10.3346/jkms.2023.38.e220
  4. S Park, D Lee, Y Kim, S Lim, H Chae and S Kim. "BioVLAB-Cancer-Pharmacogenomics: tumor heterogeneity and pharmacogenomics analysis of multi-omics data from tumor on the cloud." Bioinformatics 38.1 (2022): 275-277. doi:10.1093/bioinformatics/btab478
  5. S Lim, Y Lu, CY Cho, Y Kim, S Park and S Kim. "A review on compound-protein interaction prediction methods: data, format, representation and model." Computational and Structural Biotechnology Journal 19 (2021): 1541-1556. doi:10.1016/j.csbj.2021.03.004
  6. M Oh, S Park, S Kim and H Chae. "Machine learning-based analysis of multi-omics data on the cloud for investigating gene regulations." Briefings in bioinformatics 22.1 (2021): 66-76. doi:10.1093/bib/bbaa032
  7. M Oh, S Park, S Lee, D Lee, S Lim, D Jeong, K Jo, I Jung and S Kim. "DRIM: a web-based system for investigating drug response at the molecular level by condition-specific multi-omics data integration." Frontiers in Genetics 11 (2020): 564792. doi:10.3389/fgene.2020.564792
  8. S Park, M Kim, S Seo, S Hong, K Han, K Lee, JH Cheon and S Kim. "A secure SNP panel scheme using homomorphically encrypted K-mers without SNP calling on the user side." BMC genomics 20 (2019): 163-174. doi:10.1186/s12864-019-5473-z

Preprints

  1. Y Lu, S Lim, S Park, MG Choi, C Cho, S Kang and S Kim. "EnsDTI-kinase: Web-server for Predicting Kinase-Inhibitor Interactions with Ensemble Computational Methods and Its Applications." bioRxiv (2023): 2023-01. doi:10.1101/2023.01.06.523052

Proceedings

  1. TH Kwon, B Koo, S Park, T Southiratn and S Kim. "Web-based Exploratory Data Mining System for Analyzing the Gene-level Relationship between Intratumoral Heterogeneity of Promoter DNA Methylation and Drug Response." 한국정보과학회 학술발표논문집 (2024): 363-365. dbpia.co.kr

Presentations

  • 2023 SNU Artificial Intelligence Institute Retreat
    "Dual representation learning for predicting drug-side effect frequency using protein target information."
  • AI for Drug Discovery Symposium, MOGAM Institute of Biomedical Research
    "BioVLAB-Cancer-Pharmacogenomics: Tumor heterogeneity and pharmacogenomics analysis of multi-omics data from tumor on the cloud."
  • The 6th SNU Bioinformatics Research Exchange Conference
    "Multi-omics integrative analysis pipelines of cancer pharmacogenomics."
  • ICGC ARGO 17th Scientific Workshop / 4th ARGO Meeting
    "BioVLAB-Cancer-Pharmacogenomics: Tumor heterogeneity and pharmacogenomics analysis of multi-omics data from tumor on the cloud."
  • The 17th Asia Pacific Bioinformatics Conference (APBC 2019)
    "A secure SNP panel scheme using homomorphically encrypted K-mers without SNP calling on the user side."

Awards & Honors

  • 2022 1H Talented Researcher Fellowship
    BK21 FOUR Graduate School Innovation Project
  • 2021 Star Student Researcher Award
    BK21 FOUR Intelligence Computing
  • Standigm/Korean Society for Bioinformatics Best Paper Award
    BIOINFO 2021, Korean Society for Bioinformatics
  • Second-tier Travel Fellowship
    The 17th Asia Pacific Bioinformatics Conference

Toy Projects

Implemented and deployed Naoki Homma's board game "Parade", accomodating 2-6 players per game. A round typically lasts 10-15 minutes. Currently, the interface and tutorials are available only in Korean.

Open competition for Parade playing logic. Features a tutorial, real-time ongoing tournament, an ELO ranking system, and more. Currently, all informations are available only in Korean.

Languages

  • Korean (native)
  • English (fluent)
  • Japanese (intermediate)