Skip to content

Commit

Permalink
GITBOOK-349: change request with no subject merged in GitBook
Browse files Browse the repository at this point in the history
  • Loading branch information
fedorov authored and gitbook-bot committed Apr 23, 2024
1 parent e53652a commit e86819a
Showing 1 changed file with 11 additions and 11 deletions.
22 changes: 11 additions & 11 deletions data/downloading-data/downloading-data-with-s5cmd.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,45 +31,45 @@ Queries below demonstrate how to get the Google Storage URLs to download cohort

{% code overflow="wrap" %}
```sql
# Select all files from GCS for a given PatientID
SELECT DISTINCT(CONCAT("cp s3://", SPLIT(gcs_url,"/")[SAFE_OFFSET(2)], "/", crdc_series_uuid, "/* ."))
# Select all files for a given PatientID
SELECT DISTINCT(CONCAT(series_aws_url, "* ."))
FROM `bigquery-public-data.idc_current.dicom_all`
WHERE PatientID = "LUNG1-001"
```
{% endcode %}

{% code overflow="wrap" %}
```sql
# Select all files from GCS for a given collection
SELECT DISTINCT(CONCAT("cp s3://", SPLIT(gcs_url,"/")[SAFE_OFFSET(2)], "/", crdc_series_uuid, "/* ."))
# Select all files for a given collection
SELECT DISTINCT(CONCAT(series_aws_url, "* ."))
FROM `bigquery-public-data.idc_current.dicom_all`
WHERE collection_id = "nsclc_radiomics"
```
{% endcode %}

{% code overflow="wrap" %}
```sql
# Select all files from GCS for a given DICOM series
SELECT DISTINCT(CONCAT("cp s3://", SPLIT(gcs_url,"/")[SAFE_OFFSET(2)], "/", crdc_series_uuid, "/* ."))
# Select all files for a given DICOM series
SELECT DISTINCT(CONCAT(series_aws_url, "* ."))
FROM `bigquery-public-data.idc_current.dicom_all`
WHERE SeriesInstanceUID = "1.3.6.1.4.1.32722.99.99.298991776521342375010861296712563382046"
```
{% endcode %}

{% code overflow="wrap" %}
```sql
# Select all files from GCS for a given DICOM study
SELECT DISTINCT(CONCAT("cp s3://", SPLIT(gcs_url,"/")[SAFE_OFFSET(2)], "/", crdc_series_uuid, "/* ."))
# Select all files for a given DICOM study
SELECT DISTINCT(CONCAT(series_aws_url, "* ."))
FROM `bigquery-public-data.idc_current.dicom_all`
WHERE StudyInstanceUID = "1.3.6.1.4.1.32722.99.99.239341353911714368772597187099978969331"
```
{% endcode %}

If you want to download the files corresponding to the cohort from AWS instead of GCP, substitute aws`_url` for gc`s_url` in the `SELECT` statement of the query, such as in the following SELECT clause:
If you want to download the files corresponding to the cohort from GCP instead of AWS, substitute `series_aws_url` for `series_gcp_url` in the `SELECT` statement of the query, such as in the following SELECT clause:

{% code overflow="wrap" %}
```sql
SELECT DISTINCT(CONCAT("cp s3://", SPLIT(aws_url,"/")[SAFE_OFFSET(2)], "/", crdc_series_uuid, "/* ."))
SELECT DISTINCT(CONCAT(series_gcp_url, "* ."))
```
{% endcode %}

Expand Down Expand Up @@ -107,7 +107,7 @@ WHERE collection_id = "nsclc_radiomics"

[`s5cmd`](https://github.com/peak/s5cmd) is a very fast S3 and local filesystem execution tool that can be used for accessing IDC buckets and downloading files both from GCS and AWS.

Install `s5cmd` following the instructions in [https://github.com/peak/s5cmd#installation](https://github.com/peak/s5cmd#installation).
Install `s5cmd` following the instructions in [https://github.com/peak/s5cmd#installation](https://github.com/peak/s5cmd#installation), or if you have Python pip on you system you can just do `pip install s5cmd --upgrade`.

You can verify if your setup was successful by running the following command: it should successfully download one file from IDC.

Expand Down

0 comments on commit e86819a

Please sign in to comment.