When preparing medical imaging data for deep learning, researchers often need to extract all imagery from a large dataset of DICOM files. DICOM Media Exporter (DME) makes this quick and easy. DME extracts all image and video data from a collection of DICOM files and saves them to disc. The tool also exports all metadata contained in the DICOM files in json format.
All of DME's settings are passed via a yaml configuration file (see config.yml
section 👇 for details). This config includes the location of your DICOM files, where you want to save the outputs, and details about the output format. Run DME using
python3 dme.py config.yml
This command will find all DICOM files in a specified directory (optionally recursively) and will export them as media files to a specified output location. If used recursively, the file structure of the input directory will be mirrored in the output directory.
- Tested for Ultrasound Image and Ultrasound Video DICOM files, but may work for others. Please submit an issue if you run into an unsupported DICOM type.
- Uses multiprocessing to speed things up ⚡. This feature has not been tested on Windows.
- Currently DICOM Media Exporter only exports still images and videos. If your application requires exporting other media types (audio for example), please submit an issue.
- DME uses Pillow to encode images and FFmpeg to encode video. DME supports any output format options supported by those libraries by listing key-value pair options in the configuration file.
DME does not currently provide a package installation method. To install, simply clone this repository
git clone https://github.com/MedAI-Clemson/dicom-media-exporter.git
and install the python dependencies listed in requirements.txt
file. Install them (in a virtual environment) using
python -m pip install -r requirements.txt
Note: DME depends on FFmpeg (see FFmpeg section below)
num_workers: 4 # number of parallel processes used for the conversion. Use 0 to avoid creating child processes.
dicom:
dir: <input directory path> # directory containing DICOM files
recursive: true # whether to recursively search under `dir`
media:
dir: <output directory path> # directory to write media files
overwrite: false # whether to overwrite existing media files with matching filepaths. Skips matching files if false.
metadata:
file: <output file path> # JSON-lines file (usually *.jsonl) containing DICOM metadata
append: false # whether to append if `file` already exists. If false and `file` exists, raises an error to avoid data duplication
config:
file: <output file path> # location to save a copy of provided config file for reproducibility
video:
extension: ".mp4" # file extension for video files
save_method_kwargs:
encoding_format: "h264" # the encoding format passed to ffmpeg
encoding_args: # any additional encoding options passed to ffmpeg
pix_fmt: "yuv420p"
options:
crf: "17"
image:
extension: ".png" # file extension for image files. Image encoding is inferred from extension.
save_method_kwargs:
pil_writer_params: {} # parameters for the Pillow image save function corresponding to `extension`
DME saves all DICOM fields to the provided metadata file in JSON Lines format. The resulting file will have one json object per line corresponding to a single DICOM file. These json objects contain key-value pairs representing the DICOM metadata. The key names match the full field name provided in the DICOM file which should correspond to those defined in the DICOM standard.
Note: DME ignores DICOM fields with value representations "SQ", "OB", and "OW". These are excluded due to their large size and because these fields usually store media, which is saved separately by DME.
In addition to the DICOM fields, the exported metadata includes the following additional key-value pairs for linking to the input data and exported media:
dicom_file
: the location of the input dicom file relative to the provided root DICOM directory.media_file
: the location of the output media file relative to the provided root media directory.media_type
: the type of output media, eithervideo
orimage
DME uses FFmpeg to encode DICOM multi-frame images as video files. Before using DME, FFmpeg must be installed and accessible via the $PATH
environment variable.
There are a variety of ways to install FFmpeg:
- The official download links.
- Your package manager of choice (e.g.
sudo apt install ffmpeg
on Debian/Ubuntu,brew install ffmpeg
on OS X, etc.). - For Palmetto Cluster users, FFmpeg can be loaded using
module load ffmpeg/<version>
.
Regardless of how FFmpeg is installed, you can check if your environment path is set correctly by running the ffmpeg
command from the terminal, in which case the version information should appear, as in the following example (truncated for brevity):
$ ffmpeg
ffmpeg version 4.4.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 9.5.0 (Spack GCC)
Note: The actual version information displayed here may vary from one system to another; but if a message such as
ffmpeg: command not found
appears instead of the version information, FFmpeg is not properly installed.
- Support use of Python API to export files
- Support the export of 3d data formats
- Support the export of video/3d as folders with image/slice frames
- Test on Windows machine
- Use proper logger
- Support audio export
- Support conversion of database from Orthanc server