Welcome to my personal repository, a curated collection of cutting-edge research at the intersection of machine learning and healthcare. As an AI researcher with a strong interest in healthcare applications, I've compiled this repository to showcase innovative works mostly in natural language processing (NLP) and multimodal learning within the healthcare domain. While this collection reflects my personal research focus, it aims to serve as a valuable resource for anyone passionate about leveraging machine learning for healthcare. I welcome contributions and discussions, so feel free to share ideas or suggest papers!
- (2023/11) Meditron-70b: Scaling medical pretraining for large language models [paper]
- Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare [paper]
- Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks [paper]
- Health-LLM: Large language models for health prediction via wearable sensor data [paper]
- (2022/03) MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering [paper]
- (2023/07) Med-HALT: Medical Domain Hallucination Test for Large Language Models [paper]
- (2024/01) K-QA: A Real-World Medical Q&A Benchmark [paper]
- (2023/05) MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain [paper]
- (2023/11) MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning [paper]
- (2024/02) Ai hospital: Interactive evaluation and collaboration of llms as intern doctors for clinical diagnosis [paper]
- (2024/02) AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning [paper]
- (2024/04) Adaptive Collaboration Strategy for LLMs in Medical Decision Making [paper]
- (2024/05) Agent hospital: A simulacrum of hospital with evolvable medical agents [paper]
- (2024/05) AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments [paper]
- (2024/05) DrHouse: An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert Knowledge [paper]
- (2024/06) ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World [paper]
- (2024/07) MMedAgent: Learning to Use Medical Tools with Multi-modal Agent [paper]
- (2024/08) MEDCO: Medical Education Copilots Based on A Multi-Agent Framework [paper]
- (2024/08) Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions [paper]
- (2017/03) Generating Multi-label Discrete Patient Records using Generative Adversarial Networks [paper]
- (2010/10) Data-driven approach for creating synthetic electronic medical records [paper]
- (2023/03) EHRDiff: Exploring Realistic EHR Synthesis with Diffusion Models [paper]
- (2023/04) Synthesize High-dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Model [paper]
- (2023/08) EHR-Safe: generating high-fidelity and privacy-preserving synthetic electronic health records [paper]
- LLMSYN: Generating Synthetic Electronic Health Records Without Patient-Level Data [paper]
- GenHPF: General Healthcare Predictive Framework with Multi-task Multi-source Learning [paper]
- REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models [paper]
- EMERGE: Integrating RAG for Improved Multimodal EHR Predictive Modeling [paper]
- EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models [paper]
- MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images [paper]
- Learning Missing Modal Electronic Health Records with Unified Multi-modal Data Embedding and Modality-Aware Attention [paper]
- From Basic to Extra Features: Hypergraph Transformer Pretrain-then-Finetuning for Balanced Clinical Predictions on EHR [paper]
- FlexCare: Leveraging Cross-Task Synergy for Flexible Multimodal Healthcare Prediction [paper]
- MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models [paper]
- Multimodal Patient Representation Learning with Missing Modalities and Labels [paper]
- (2015/03) Toward a Natural Language Interface for EHR Questions, Roberts and Demner-Fushman, 2015 [Paper]
- (2016/05) Annotating logical forms for EHR questions, Roberts and Demner-Fushman, 2016 [Paper]
- (2018/04) A Semantic Parsing Method for Mapping Clinical Questions to Logical Forms, Roberts and Demner-Fushman [Paper]
- (2018/09) emrQA: A Large Corpus for Question Answering on Electronic Medical Records, Pampri et al., 2018 [Paper]
- (2019/08) Text-to-SQL Generation for Question Answering on Electronic Medical Records, Wang et al., 2019 [Paper]
- (2020/03) Using FHIR to Construct a Corpus of Clinical Questions Annotated with Logical Forms and Answers, Soni et al., 2020 [Paper]
- (2020/05) Dataset and Enhanced Model for Eligibility Criteria-to-SQL Semantic Parsing, Yu et al., 2018 [Paper]
- (2020/05) Paraphrasing to improve the performance of Electronic Health Records Question Answering, Soni and Roberts, 2020 [Paper]
- (2020/10) Knowledge Graph-based Question Answering with Electronic Health Records, Park et al., 2020 [Paper]
- (2021/06) emrKBQA: A Clinical Knowledge-Base Question Answering Dataset, Raghavan et al., 2021 [Paper]
- (2021/11) Question Answering for Complex Electronic Health Records Database using Unified Encoder-Decoder Architecture, Bae et al., 2021 [Paper]
- (2022/03) Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records, Kim et al., 2022 [Paper]
- (2022/05) DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries, Bardhan et al., 2022 [Paper]
- (2022/06) Learning to Ask Like a Physician, Lehman et al., 2022 [Paper]
- (2022/06) RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports, Soni et al., 2022 [Paper]
- (2023/01) EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records, Lee et al., 2023 [Paper]
- (2023/04) LeafAI: query generator for clinical cohort discovery rivaling a human programmer, Dobbins et al., 2023 [Paper]
- (2023/04) Toward a Neural Semantic Parsing System for EHR Question Answering, Soni and Roberts [Paper]
- (2023/04) quEHRy: a question answering system to query electronic health records, Soni et al., 2023 [Paper]
- (2023/06) ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram, Oh et al., 2023 [Paper]
- (2023/08) MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records, Fleming et al., 2023 [Paper]
- (2023/09) Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization, Veen et al., 2023 [Paper]
- (2023/10) Question Answering for Electronic Health Records: A Scoping Review of datasets and models, Bardhan et al., 2023 [Paper]
- (2023/10) EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images, Bae and Kyung et al., 2023 [Paper]
- (2024/01) EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records, Shi and Xu et al., 2024 [Paper]
- (2024/02) EHRNoteQA: A Patient-Specific Question Answering Benchmark for Evaluating Large Language Models in Clinical Settings, Kweon and Kim et al., 2024 [Paper]
- (2024/03) A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries, Aali et al., 2024 [Paper]
- Explainable Automated Fact-Checking for Public Health Claims [paper]
- Evidence-based Fact-Checking of Health-related Claims [paper] [code]
- HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking [paper]
- DOSSIER: Fact checking in electronic health records while preserving patient privacy [paper] [code]
- EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records [paper] [code]
- (2019/12) MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports [paper]
- (2019/01) MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs [paper]
- (2023/10) Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge [paper]
- (2024/03) A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities [paper]
- (2024/04) RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis [paper]
- (2024/06) Shadow and Light: Digitally Reconstructed Radiographs for Disease Classification [paper]
- (2024/08) MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine [paper]
- (2024/01) CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation [paper]
- (2024/05) Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [paper]
- (2020/04) CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT [paper]
- (2021/06) RadGraph: Extracting Clinical Entities and Relations from Radiology Reports [paper]
- (2023/08) Radgraph2: Modeling disease progression in radiology reports via hierarchical information extraction [paper]
- (2023/09) Evaluating progress in automatic chest x-ray radiology report generation [paper]
- (2023/11) Radiology-Aware Model-Based Evaluation Metric for Report Generation [paper]
- (2024/03) Evaluating GPT-V4 (GPT-4 with Vision) on Detection of Radiologic Findings on Chest Radiographs [paper]
- (2024/04) LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation [paper]
- (2024/05) GREEN: Generative Radiology Report Evaluation and Error Notation [paper]