Skip to content

Official code for the Paper "RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance"

Notifications You must be signed in to change notification settings

ChantalMP/RaDialog

Repository files navigation

RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance

Authors: Chantal Pellegrini*, Ege Özsoy*, Benjamin Busam, Nassir Navab, Matthias Keicher

✨ News ✨

  • 12 July 2024: We published a new version of our Instruct Datset including additional tasks on PhysioNet
  • 29 May 2024: The LLaVA version of RaDialog is now publically available on Huggingface and Github. This new version is much better in conversational assistance, easier to use and allows a simple inference setup with huggingface!
  • 26 March 2024: RaDialog Instruct Dataset now available on PhysioNet!

teaser

Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a large language model (LLM) while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems.

Installation

Environment Setup:

1) RaDialog Environment

  • clone this repository and move to the radialog directory with cd RaDialog
  • Install the RaDialog environment with conda create --name radialog python=3.7
  • Activate the environment with conda activate radialog
  • Install the requirements with pip install -r requirements.txt
  • Install hl-ml-multimodal with pip install hi-ml-multimodal==0.2.0
  • Reinstall correct versions of torch and transformers with pip install torch==1.13.0 transformers==4.28.1
  • Install java and set JAVA_HOME and PATH in local_config.py (we used jre1.8.0)

2) CheXbert Environment

  • Install the CheXbert environment with conda create --name chexbert python=3.7
  • Activate the environment with conda activate chexbert
  • Move to the chexbert directory with cd chexbert
  • Install the requirements with pip install -r requirements.txt
  • Set the absolute path to the chexbert env and folder in RaDialog/local_config.py

Prepare the Data and Models:

1) Download pretrained models

  • Download the pretrained models from here
  • place chexbert.pth in RaDialog/chexbert/src/checkpoint/
  • unzip vicuna-7b-img-instruct.zip and vicuna-7b-img-report.zip and place folders into RaDialog/checkpoints/
  • unzip chexpert_train and place folder into RaDialog/findings_classifier/checkpoints/
  • unzip embs and place folder into RaDialog/pretraining/
  • unzip checkpoint_4.pth and place it into outputs/stage1_pt_instruct_blip_origlr_img448/

2) Download MIMIC-CXR

  • Download the MIMIC-CXR-JPG dataset from here
  • The dataset should be saved in .../physionet.org/files/mimic-cxr-jpg
  • Go to physionet.org/files/mimic-cxr-jpg/files/ and unzip mimic-cxr-2.0.0-split.csv.gz
  • from here, dowload mimic-cxr-reports.zip
  • unzip it and place the folder in the same directory as the MIMIC-CXR-JPG dataset (e.g. physionet.org/files/)
  • in local_config.py set the path to the MIMIC-CXR dataset (e.g. .../physionet.org/files/)
  • in model/lavis/defaults_report.yaml set the path to the MIMIC-CXR-JPG dataset (e.g. .../physionet.org/files/mimic-cxr-jpg/2.0.0 )

3) Create sectioned report data

  • go to the mimic-cxr folder in the code with cd mimic-cxr
  • run python create_section_files.py to prepare the report data
  • go back to the RaDialog directory with cd ..

4) Prepare the instruct dataset

  • As MIMIC-CXR needs a certified PhysioNet account to be accessed, we can not publish our instruct dataset directly.
  • We are working on publishing the instruct dataset on PhysioNet. In the meantime, you can create an instruct dataset yourself by following the steps below or just use our pre-trained model.
  • The MIMIC-NLE data has to be generated first, as it also contains protected data. Follow the instructions here to generate the MIMIC-NLE data and set the path to the MIMIC-NLE data in local_config.py.
  • For the correction task, you can write us, then we can share the used incorrect predictions with you.
  • To generate data without Correction or Reasoning (MIMIC-NLE), please comment our line 335 or 336 in "create_data.py" accordingly.

Data for RaDialog-RG:

  • run python -m data.create_data --mode "RG" to generate the report generation dataset in the required format (no instruct data)

Data for RaDialog-INS:

  • run python -m data.create_data --mode "INS" to generate the instruct dataset

Run Demo:

  • run python demo.py --cfg-path pretraining/configs/blip2_pretrain_stage1_emb.yaml to start the demo
  • connect to the demo with a browser at http://127.0.0.1:7860 and start chatting with RaDialog

Evaluate RaDialog on MIMIC-CXR test set:

  • RaDialog-RG: run python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-report/checkpoint-11200
  • RaDialog-INS: run python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-instruct/checkpoint-4800
  • RaDialog-INS (correction): run python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-instruct/checkpoint-4800 --do_corr
  • RaDialog-INS (findings QA): run python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-instruct/checkpoint-4800 --do_cp_all_qa (or --do_cp_bin_qa)

Train RaDialog:

1) CheXbert classifier Training

  • run python -m findings_classifier.chexpert_train --train --run_name "train_chexbert"
  • in chexpert_train.py set ckpt_path (line 152) to the path of the trained model you just trained
  • then run python -m findings_classifier.chexpert_train --run_name "save_preds" to save the predictions of the trained model

2) Alignment Module Pretraining

  • run python -m pretraining.train --cfg-path pretraining/configs/blip2_pretrain_stage1.yaml, we used the 4th epoch checkpoint
  • run python -m pretraining.train --cfg-path pretraining/configs/blip2_pretrain_stage1_emb.yaml, to save the embeddings of the trained model

3) LLM Training

Train RaDialog-RG:

  • run python finetune.py --use_embs True --base_model 'vicuna_v7' --output_dir 'checkpoints/lora-vicuna-7b-report' --wandb_run_name lora-vicuna-7b-report --prompt_template_name vicuna_v11 --data_path "data/data_files/mimic_cxr_reports_stratified.json" --cutoff_len 600 --num_epochs 10
  • we used checkpoint-11200

Train RaDialog-INS:

  • run python finetune.py --use_embs True --base_model 'vicuna_v7' --output_dir 'checkpoints/lora-vicuna-7b-instruct' --wandb_run_name lora-vicuna-7b-instruct --prompt_template_name vicuna_v11 --data_path "data/data_files/mimic_cxr_instruct_stratified.json" --cutoff_len 800 --num_epochs 10
  • we used checkpoint-4800

To use a model from a checkpoint, you'll need to perform the following steps:

  • make a copy of "pytorch_model.bin" and rename it to "adapter_model.bin"
  • copy adapter_config.json to the checkpoint folder (it will be generated after the last epoch or you can copy it from the checkpoints we provide)

Reference

When using our model or dataset, please cite:

@article{pellegrini2023radialog,
  title={RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance},
  author={Pellegrini, Chantal and {\"O}zsoy, Ege and Busam, Benjamin and Navab, Nassir and Keicher, Matthias},
  journal={arXiv preprint arXiv:2311.18681},
  year={2023}
}

About

Official code for the Paper "RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance"

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages