Skip to content

Commit

Permalink
revert changes (NVIDIA#8410) (NVIDIA#8411)
Browse files Browse the repository at this point in the history
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
  • Loading branch information
2 people authored and sashameister committed Feb 15, 2024
1 parent 6e0a8f6 commit 2456b5a
Showing 1 changed file with 9 additions and 18 deletions.
27 changes: 9 additions & 18 deletions tutorials/multimodal/Multimodal Data Preparation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,19 @@
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Multimodal Dataset Preparation\n",
"\n",
"First step of pre-training any deep learning model is data preparation. This notebook will walk you through 5 stages of data preparation for training a multimodal model: \n",
"The first step of pre-training any deep learning model is data preparation. This notebook will walk you through the 5 stages of data preparation for training a multimodal model:\n",
"1. Download your Data\n",
"2. Extract Images and Text\n",
"3. Re-organize to ensure uniform text-image pairs\n",
"4. Precache Encodings\n",
"5. Generate Metadata required for training\n",
"\n",
"This notebook will show you how to prepare an image-text dataset into the [WebDataset](https://github.com/webdataset/webdataset) format. The Webdataset format is required to train all multimodal models in NeMo, such as Stable Diffusion and Imagen. \n",
"\n",
"This notebook is designed to demonstrate the different stages of multimodal dataset preparation. It is not meant to be used to process large-scale datasets since many stages are too time-consuming to run without parallelism. For large workloads, we recommend running the multimodal dataset preparation pipeline with the NeMo-Megatron-Launcher on multiple processors/GPUs. NeMo-Megatron-Launcher packs the same 5 scripts in this notebook into one runnable command and one config file to enable a smooth and a streamlined workflow.\n",
"\n",
"Depending on your use case, not all 5 stages need to be run. Please go to (TODO doc link) for an overview of the 5 stages.\n",
" \n",
"We will use a [dummy dataset](https://huggingface.co/datasets/cuichenx/dummy-image-text-dataset) as the dataset example throughout this notebook. This dataset is formatted as a table with one column storing the text captions, and one column storing the URL link to download the corresponding image. This is the same format as most common text-image datasets. The use of this dummy dataset is for demonstration purposes only. **Each user is responsible for checking the content of the dataset and the applicable licenses to determine if it is suitable for the intended use.**\n",
"\n",
"Let's first set up some paths."
]
"5. Generate Metadata required for training\n"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
Expand Down Expand Up @@ -58,13 +50,12 @@
"id": "c06f3527",
"metadata": {},
"source": [
"# Multimodal Dataset Preparation\n",
"\n",
"This notebook will show you how to prepare an image-text dataset into the [WebDataset](https://github.com/webdataset/webdataset) format. The Webdataset format is required to train all multimodal models in NeMo, such as Stable Diffusion and Imagen. \n",
"\n",
"This notebook is designed to demonstrate the different stages of multimodal dataset preparation. It is not meant to be used to process large-scale datasets since many stages are too time-consuming to run without parallelism. For large workloads, we recommend running the multimodal dataset preparation pipeline with the NeMo-Megatron-Launcher on multiple processors/GPUs. NeMo-Megatron-Launcher packs the same 5 scripts in this notebook into one runnable command and one config file to enable a smooth and a streamlined workflow.\n",
"\n",
"Depending on your use case, not all 5 stages need to be run. Please go to (TODO doc link) for an overview of the 5 stages.\n",
"Depending on your use case, not all 5 stages need to be run. Please go to [NeMo Multimodal Documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/multimodal/text2img/datasets.html) for an overview of the 5 stages.\n",
" \n",
"We will use a [dummy dataset](https://huggingface.co/datasets/cuichenx/dummy-image-text-dataset) as the dataset example throughout this notebook. This dataset is formatted as a table with one column storing the text captions, and one column storing the URL link to download the corresponding image. This is the same format as most common text-image datasets. The use of this dummy dataset is for demonstration purposes only. **Each user is responsible for checking the content of the dataset and the applicable licenses to determine if it is suitable for the intended use.**\n",
"\n",
Expand Down Expand Up @@ -413,7 +404,7 @@
"id": "27b26036",
"metadata": {},
"source": [
"Let's download an example precaching config file ## TODO modify this path"
"Let's download an example precaching config file"
]
},
{
Expand All @@ -425,7 +416,7 @@
},
"outputs": [],
"source": [
"! wget TODO_github_link/precache_sd.yaml -P $CONF_DIR/"
"! wget https://github.com/NVIDIA/NeMo-Megatron-Launcher/blob/master/launcher_scripts/conf/data_preparation/multimodal/precache_sd.yaml -P $CONF_DIR/"
]
},
{
Expand Down

0 comments on commit 2456b5a

Please sign in to comment.