Ph.D. Student
University of California, Los Angeles
Ph.D in Physics and Biology in Medicine Graduate Program
September 2022 - Present
Los Angeles, CA
Research Focus
Leading cutting-edge research in Vision-Language Foundation models for medical imaging applications. My work focuses on developing robust AI systems that can understand and reason about both visual and textual information, particularly in high-stakes medical domains where accuracy and reliability are paramount.
Currently pursuing a Ph.D. in Physics and Biology in Medicine, with a research emphasis on Vision-Language Foundation models, self-supervised learning techniques, and parameter-efficient fine-tuning for medical image analysis.
Major Research Projects
Vision-Language Foundation Models
Objective: Develop robust foundation models combining visual and textual understanding for accurate anatomical structure segmentation in CT scans, particularly focusing on lung segmentation and pulmonary disease detection.
Technical Approach
- Dual-loss Training: Combining cosine similarity loss between image and text embeddings with distillation loss using MedSAM as teacher model
- Zero-shot Classification: Generate high-resolution, multi-channel probability maps for various anatomical regions
- Pseudo-labeling Pipeline: Use foundation model outputs to create training labels for fine-tuning H-SAM architecture
- Knowledge Distillation: Transfer knowledge from large foundation models to efficient deployment models
Expected Impact
This approach aims to outperform current state-of-the-art CT segmentation methods while providing comprehensive understanding of anatomical structures. The research has potential for direct clinical translation in pulmonary medicine.
Fine-tuning Vision Foundation Models with Deep Layer Adapters
Purpose: Explore parameter-efficient fine-tuning strategies for vision transformers in medical image segmentation tasks.
Methodology
- Dataset: VinDr-RibCXR (245 chest X-ray images for rib segmentation)
- Models: SAM-Adapter with ViT-B (12 blocks) and ViT-H (32 blocks)
- Training: 50 epochs, batch size 2, AdamW optimizer, IoU loss
- Experiment Design: Systematic placement of adapters starting from deepest layers
Key Findings
ViT-B Results
Optimal: 7-9 adapters in deepest layers
F1 Score: 0.8240-0.8259 (vs 0.7773 baseline)
ViT-H Results
Optimal: 20-24 adapters in deepest layers
F1 Score: 0.8457-0.8469 (vs 0.7697 baseline)
Key Insight: Adding adapters to the deepest 70-75% of layers in both architectures led to optimal performance improvements.
Self-Supervised Learning for Chest X-ray Segmentation
Challenge: Address limited annotated data in medical image segmentation by leveraging large-scale unannotated datasets.
Two-Phase Approach
- Phase 1: ConvNeXt-based DINO pretraining on 148,447 unannotated chest X-rays
- Phase 2: Fine-tuning on limited annotated datasets (3, 10, 50 cases)
- Task: Multi-label segmentation of lungs, cardio mediastinum, and airways
Breakthrough Results
- Dramatic Improvement: DSC improved from 0.24 to 0.56 with only 3 annotated cases
- Consistent Benefits: Performance gains across all anatomical structures
- Model Scaling: Larger models consistently performed better
- Training Strategy: Freezing pretrained encoder sometimes enhanced performance
Automatic Radiology Report Generation (RRG) Evaluation
Problem: Current evaluation methods for automatic radiology report generation lack clinical factual accuracy assessment and nuanced understanding.
Two-Stage Solution
Stage 1: JUN Metric Development
Develop an interpretable, component-based metric (JUN - Judging Understanding of Nuance) that assesses:
- Clinical concepts accuracy
- Factual status verification
- Descriptive modifiers correctness
Stage 2: Large-Scale Dataset Generation
Leverage JUN to generate comprehensive dataset of (Reference, Candidate, Detailed Score Vector) tuples for training next-generation LLM-based evaluators.
Clinical Relevance
This work aims to provide robust tools for developing safer AI in clinical documentation, ensuring AI-generated reports meet the rigorous standards required in healthcare settings.
Research Publications from UCLA
Journal Publications
- Jin Kim, Muhammad Wahi-Anwa, Sangyun Park, Shawn Shin, John M. Hoffman, Matthew S. Brown - "Autonomous Computer Vision Development with Agentic AI", arXiv
- Jin Kim, Matthew Brown, Dan Ruan - "Dual-path Radiology Report Generation: Fusing Pathology Classification with Language Model", MICCAI 2025 Workshop - Vision-Language Model for medical applications
- Lasse Hansen, ..., Jin Kim, Dan Ruan, ... - "Learn2Reg 2024: New Benchmark Datasets Driving Progress on New Challenges", Learn2Reg 2024 (under review)
Conference Papers
- Sangyun Park*, Jin Kim*, Yuchen Cui, Matthew Sherman Brown - "TRACE: Textual Reasoning for Affordance Coordinate Extraction", ICCV 2025 Workshop (Under Review) - Vision-Language Model application
- Sangyun Park*, Jin Kim* - "SPAR: Spatial Precision with Articulated Reasoning", ICCV 2025 Workshop (Under Review)
- Jin Kim, Matthew Brown, Dan Ruan - "Improving Foundation Models with Deep Layer Adapters for Medical Image Segmentation", RSNA 2024 (Oral) - Medical Image Analysis presentation
- Jin Kim, Matthew Brown, Dan Ruan - "Self-Supervised Learning Without Annotations to Improve Lung Chest X-Ray Segmentation", SPIE 2024 🏆 - Medical Image Analysis presentation
Key Achievements
SPIE 2024 Winner
1st place at SPIE 2024 Live Demonstrations Workshop for "SimpleMind: A Cognitive AI software environment"
15+ Publications
First author on multiple high-impact papers in top-tier venues including RSNA, SPIE, and MICCAI
PhD Qualifier
Passed qualifying exam on "Leveraging Foundation Models, Knowledge Distillation, and Pseudo-Labeling for Robust Lung Segmentation"
Research Mentorship
Mentored undergraduate students in deep learning and medical image analysis research projects