Skip to content

Commit

Permalink
GITBOOK-363: change request with no subject merged in GitBook
Browse files Browse the repository at this point in the history
  • Loading branch information
fedorov authored and gitbook-bot committed Sep 4, 2024
1 parent 7b5c888 commit 5d964cc
Show file tree
Hide file tree
Showing 3 changed files with 50 additions and 35 deletions.
Binary file added .gitbook/assets/2024-09-04_17-58-05.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
60 changes: 43 additions & 17 deletions data/downloading-data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,37 +6,48 @@ If you have questions or feedback about the download tools provided by IDC, plea

Depending on whether you would like to download data interactively or programmatically, we provide two recommended tools to help you.

### Interactive download: 3D Slicer SlicerIDCBrowser extension
### Command-line or programmatic download: idc-index python package

[3D Slicer](https://www.slicer.org/) is a free open source, cross-platform, extensible desktop application developed to support a variety of medical imaging research use cases.
[`idc-index`](https://github.com/ImagingDataCommons/idc-index) is a python package designed to simplify access to IDC data. Assuming you have Python installed on your computer (if for some reason you do not have Python, you can check out legacy download instructions [here](downloading-data-with-s5cmd.md)), you can get this package with `pip` like this:

IDC maintains [SlicerIDCBrowser](https://github.com/ImagingDataCommons/SlicerIDCBrowser), an extension of 3D Slicer, developed to support direct access to IDC data from your desktop. You will need to [install](https://download.slicer.org/) a recent 3D Slicer 5.7.0 preview application (installers are available for Windows, Mac and Linux), and next use 3D Slicer ExtensionManager to install SlicerIDCBrowser extension. Take a look at the quick demo video in [this post](https://discourse.canceridc.dev/t/sliceridcbrowser-extension-released/515) if you have never used 3D Slicer ExtensionManager before.
```shell-session
pip install idc-index --upgrade
```

Once installed, you can use SlicerIDCBrowser in one of the two modes:
Once installed, you can use it to explore, search, select and download corresponding files as shown in the examples below. You can also take a look at a short tutorial on using `idc-index` [here](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/labs/idc\_rsna2023.ipynb).

1. **As an interface to explore IDC data**: you can select individual collections, cases and DICOM studies and download items of interest directly into 3D Slicer for subsequent visualization and analysis.
2. **As download tool**: download IDC content based on the manifest you created using IDC Portal, or identifiers of the individual cases, DICOM studies or series. 
#### Command line download interface

<figure><img src="../../.gitbook/assets/image (24).png" alt=""><figcaption><p>Copy identifiers for the studies/series of interest from the IDC Portal</p></figcaption></figure>
With the `idc-index` package you get command line scripts that aim to make download simple.

<figure><img src="../../.gitbook/assets/image (26).png" alt=""><figcaption><p>Insert the identifiers in the appropriate fields, or download content defined by the s5cmd manifest</p></figcaption></figure>
Have a .s5cmd manifest file you downloaded from IDC Portal or from the records in the IDC Zenodo community? Get the corresponding files as follows (you will also get download progress bar and the downloaded files will be organized in the collection/patient/study/series folder hierarchy!):

### Programmatic download: idc-index python package
```sh
idc download manifest_file.s5cmd
```

{% hint style="warning" %}
`idc-index` package is under development, and its API may change in the future releases!
{% endhint %}
You can use the same command to download files corresponding to any collection, patient, study or series, referred to by the identifiers you can copy from the portal!&#x20;

[`idc-index`](https://github.com/ImagingDataCommons/idc-index) is a python package designed to simplify access to IDC data.&#x20;

Once you installed the package with pip install idc-index, you can use it to explore, search, select and download corresponding files as shown in the examples below.

You can also take a look at a short tutorial on using `idc-index` [here](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/labs/idc\_rsna2023.ipynb).
<figure><img src="../../.gitbook/assets/2024-09-04_17-58-05.gif" alt=""><figcaption><p>Copy collection ID from the IDC Portal interface</p></figcaption></figure>

```shell-session
pip install idc-index --upgrade
```sh
$ idc download pseudo_phi_dicom_data
2024-09-04 17:59:50,944 - Downloading from IDC v18 index
2024-09-04 17:59:50,952 - Identified matching collection_id: ['pseudo_phi_dicom_data']
2024-09-04 17:59:50,959 - Total size of files to download: 1.27 GB
2024-09-04 17:59:50,959 - Total free space on disk: 29.02233088GB
2024-09-04 17:59:51,151 - Not using s5cmd sync as the destination folder is empty or sync or progress bar is not requested
2024-09-04 17:59:51,156 - Initial size of the directory: 0 bytes
2024-09-04 17:59:51,156 - Approximate size of the files that need to be downloaded: 1274140000.0 bytes
Downloading data: 7%|█████ | 86.3M/1.27G [00:13<03:06, 6.36MB/s]
```

Similarly, you can copy identifiers for patient/study/series and download the corresponding content!

#### Programmatic download

```python
from idc_index import index

Expand Down Expand Up @@ -64,3 +75,18 @@ client.download_from_selection(seriesInstanceUID=\
```

`idc-index` includes a variety of other helper functions, such as download from the manifest created using IDC portal, automatic generation of the viewer URLs, information about disk space needed for a given collection, and more. We are very interested in your feedback to define the additional functionality to add to this package! Please reach out via [IDC Forum](https://discourse.canceridc.dev/) if you have any suggestions.

### Interactive download: 3D Slicer SlicerIDCBrowser extension

[3D Slicer](https://www.slicer.org/) is a free open source, cross-platform, extensible desktop application developed to support a variety of medical imaging research use cases.

IDC maintains [SlicerIDCBrowser](https://github.com/ImagingDataCommons/SlicerIDCBrowser), an extension of 3D Slicer, developed to support direct access to IDC data from your desktop. You will need to [install](https://download.slicer.org/) a recent 3D Slicer 5.7.0 preview application (installers are available for Windows, Mac and Linux), and next use 3D Slicer ExtensionManager to install SlicerIDCBrowser extension. Take a look at the quick demo video in [this post](https://discourse.canceridc.dev/t/sliceridcbrowser-extension-released/515) if you have never used 3D Slicer ExtensionManager before.

Once installed, you can use SlicerIDCBrowser in one of the two modes:

1. **As an interface to explore IDC data**: you can select individual collections, cases and DICOM studies and download items of interest directly into 3D Slicer for subsequent visualization and analysis.
2. **As download tool**: download IDC content based on the manifest you created using IDC Portal, or identifiers of the individual cases, DICOM studies or series.&#x20;

<figure><img src="../../.gitbook/assets/image (24).png" alt=""><figcaption><p>Copy identifiers for the studies/series of interest from the IDC Portal</p></figcaption></figure>

<figure><img src="../../.gitbook/assets/image (26).png" alt=""><figcaption><p>Insert the identifiers in the appropriate fields, or download content defined by the s5cmd manifest</p></figcaption></figure>
25 changes: 7 additions & 18 deletions data/downloading-data/downloading-data-with-s5cmd.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,28 +119,17 @@ s5cmd --no-sign-request --endpoint-url https://storage.googleapis.com cp s3://pu

Once `s5cmd` is installed, you can use `s5cmd run` command to download the files corresponding to the manifest.

If you defined manifest that references GCP buckets:

<pre class="language-bash" data-overflow="wrap"><code class="lang-bash">s5cmd --no-sign-request <a data-footnote-ref href="#user-content-fn-1">--endpoint-url https://storage.googleapis.com</a> run manifest_file_name
</code></pre>

If you defined manifest that references AWS buckets:

<pre class="language-bash" data-overflow="wrap"><code class="lang-bash">s5cmd --no-sign-request <a data-footnote-ref href="#user-content-fn-2">--endpoint-url https://s3.amazonaws.com</a> run manifest_file_name
</code></pre>

{% hint style="info" %}
If you created the manifest using IDC Portal, you will have the instructions to install `s5cmd` and the exact command to download its content in the header of the manifest, which will look like this:

{% code overflow="wrap" %}
```
# To download the files in this manifest, first install s5cmd (https://github.com/peak/s5cmd),
# then run the following command:
# s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com run cohorts_996_20230505_72608_aws.s5cmd
```bash
s5cmd --no-sign-request run manifest_file_name
```
{% endcode %}
{% endhint %}

[^1]: Use this endpoint for accessing GCS buckets
If you defined manifest that references GCP buckets, you will need to specify GCS endpoint:

[^2]: Use this endpoint for accessing AWS buckets
<pre class="language-bash" data-overflow="wrap"><code class="lang-bash">s5cmd --no-sign-request <a data-footnote-ref href="#user-content-fn-1">--endpoint-url https://storage.googleapis.com</a> run manifest_file_name
</code></pre>

[^1]: Use this endpoint for accessing GCS buckets

0 comments on commit 5d964cc

Please sign in to comment.