Skip to content

Commit

Permalink
Merge branch 'prod' into sprint-54-gw
Browse files Browse the repository at this point in the history
  • Loading branch information
fedorov authored Apr 25, 2024
2 parents 37b4cfd + a6155da commit bdb3712
Show file tree
Hide file tree
Showing 3 changed files with 72 additions and 13 deletions.
61 changes: 60 additions & 1 deletion data/data-release-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,64 @@ Please refer to the license and terms of use, which are defined in the `license_

## v18 - April 2024

New clinical metadata tables
### New radiology collections

1. [Advanced-MRI-Breast-Lesions](https://doi.org/10.7937/C7X1-YN57)

### New analysis results

1. [RMS-Mutation-Prediction-Expert-Annotations](https://doi.org/10.5281/zenodo.10462857)\*\
Collections analyzed:
1. [RMS-Mutation-Prediction](https://doi.org/10.5281/zenodo.8225131)
2. [TotalSegmentator-CT-Segmentations](https://doi.org/10.5281/zenodo.8347011)\*\*\
Collections analyzed:
1. [NLST](https://doi.org/10.7937/TCIA.HMQ8-J677)

### Revised radiology collections

(starred collections are revised due to new or revised analysis results)

1. [Breast-Cancer-Screening-DBT](https://doi.org/10.7937/E4WT-CD02) (revisions only to clinical data)
2. [NLST](https://doi.org/10.7937/TCIA.HMQ8-J677)\*\*

### Revised pathology collections

(starred collections are revised due to new or revised analysis results)

1. [CPTAC-BRCA](https://doi.org/10.7937/TCIA.CAEM-YS80) (fix PatientAges > 090Y)
2. [CPTAC-COAD](https://doi.org/10.7937/TCIA.YZWQ-ZZ63) (fix PatientAges > 090Y)
3. [RMS-Mutation-Prediction](https://doi.org/10.5281/zenodo.8225131)\*
1. Also added missing instance \
SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.3459553143.523311062.1687086765943.9.0
2. Removed corrupted instances
1. SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.2164023716.1899467316.1685791236516.37.0
2. SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.2411736851.773458418.1686038949651.37.0
3. SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.2411736851.773458418.16860389
4. [TCGA-BLCA](https://doi.org/10.7937/K9/TCIA.2016.8LNG8XDR) (All TCGA revisions are to correct multiple manufacturer values within same series)
5. [TCGA-BRCA](https://doi.org/10.7937/K9/TCIA.2016.AB2NAZRP)
6. [TCGA-CHOL](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/studied-cancers/cholangiocarcinoma)
7. [TCGA-COAD](https://doi.org/10.7937/K9/TCIA.2016.HJJHBOXZ)
8. TCGA-DLBC (No description page)
9. [TCGA-ESCA](https://doi.org/10.7937/K9/TCIA.2016.VPTNRGFY)
10. [TCGA-HNSC](https://doi.org/10.7937/K9/TCIA.2016.LXKQ47MS)
11. [TCGA-KIRC](https://doi.org/10.7937/K9/TCIA.2016.V6PBVTDR)
12. [TCGA-KIRP](https://doi.org/10.7937/K9/TCIA.2016.ACWOGBEF)
13. [TCGA-LIHC](https://doi.org/10.7937/K9/TCIA.2016.IMMQW8UQ)
14. [TCGA-LUAD](https://doi.org/10.7937/K9/TCIA.2016.JGNIHEP5)
15. [TCGA-LUSC](https://doi.org/10.7937/K9/TCIA.2016.TYGKKFMQ)
16. [TCGA-PAAD](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/studied-cancers/pancreatic)
17. [TCGA-PRAD](https://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y)
18. [TCGA-READ](https://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU)
19. [TCGA-SARC](https://doi.org/10.7937/K9/TCIA.2016.CX6YLSUX)
20. [TCGA-SKCM](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/studied-cancers/melanoma-skin)
21. [TCGA-STAD](https://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM)
22. [TCGA-TGCT](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/studied-cancers/testicular-germ-cell)
23. [TCGA-THCA](https://doi.org/10.7937/K9/TCIA.2016.9ZFRVF1B)
24. [TCGA-THYM](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/studied-cancers/thymoma)
25. [TCGA-UCEC](https://doi.org/10.7937/K9/TCIA.2016.GKJ0ZWAC)
26. [TCGA-UCS](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/studied-cancers/uterine-carcinosarcoma)

### New clinical metadata tables

1. [acrin_nsclc_fdg_pet_bamf_lung_pet_ct_segmentation](https://portal.imaging.datacommons.cancer.gov/explore/filters/?collection\_id=acrin\_nsclc\_fdg\_pet)
2. [anti_pd_1_lung_bamf_lung_ct_segmentation](https://portal.imaging.datacommons.cancer.gov/explore/filters/?collection\_id=anti\_pd\_1\_lung)
Expand All @@ -34,7 +91,9 @@ New clinical metadata tables
18. [tcga_lusc_bamf_lung_mr_segmentation](https://portal.imaging.datacommons.cancer.gov/explore/filters/?collection\_id=tcga\_lusc)


### Notes

The deprecated columns `tcia_api_collection_id` and `idc_webapp_collection_id` have been removed from the `auxiliary_metadata` table in the `idc_v18` BQ dataset. These columns were duplicates of columns `collection_name` and `collection_id` respectively. 

## v17 - December 2023

Expand Down
2 changes: 1 addition & 1 deletion data/downloading-data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Once you installed the package with pip install idc-index, you can use it to exp
You can also take a look at a short tutorial on using `idc-index` [here](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/labs/idc\_rsna2023.ipynb).

```shell-session
pip install idc-index
pip install idc-index --upgrade
```

```python
Expand Down
22 changes: 11 additions & 11 deletions data/downloading-data/downloading-data-with-s5cmd.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,45 +31,45 @@ Queries below demonstrate how to get the Google Storage URLs to download cohort

{% code overflow="wrap" %}
```sql
# Select all files from GCS for a given PatientID
SELECT DISTINCT(CONCAT("cp s3://", SPLIT(gcs_url,"/")[SAFE_OFFSET(2)], "/", crdc_series_uuid, "/* ."))
# Select all files for a given PatientID
SELECT DISTINCT(CONCAT(series_aws_url, "* ."))
FROM `bigquery-public-data.idc_current.dicom_all`
WHERE PatientID = "LUNG1-001"
```
{% endcode %}

{% code overflow="wrap" %}
```sql
# Select all files from GCS for a given collection
SELECT DISTINCT(CONCAT("cp s3://", SPLIT(gcs_url,"/")[SAFE_OFFSET(2)], "/", crdc_series_uuid, "/* ."))
# Select all files for a given collection
SELECT DISTINCT(CONCAT(series_aws_url, "* ."))
FROM `bigquery-public-data.idc_current.dicom_all`
WHERE collection_id = "nsclc_radiomics"
```
{% endcode %}

{% code overflow="wrap" %}
```sql
# Select all files from GCS for a given DICOM series
SELECT DISTINCT(CONCAT("cp s3://", SPLIT(gcs_url,"/")[SAFE_OFFSET(2)], "/", crdc_series_uuid, "/* ."))
# Select all files for a given DICOM series
SELECT DISTINCT(CONCAT(series_aws_url, "* ."))
FROM `bigquery-public-data.idc_current.dicom_all`
WHERE SeriesInstanceUID = "1.3.6.1.4.1.32722.99.99.298991776521342375010861296712563382046"
```
{% endcode %}

{% code overflow="wrap" %}
```sql
# Select all files from GCS for a given DICOM study
SELECT DISTINCT(CONCAT("cp s3://", SPLIT(gcs_url,"/")[SAFE_OFFSET(2)], "/", crdc_series_uuid, "/* ."))
# Select all files for a given DICOM study
SELECT DISTINCT(CONCAT(series_aws_url, "* ."))
FROM `bigquery-public-data.idc_current.dicom_all`
WHERE StudyInstanceUID = "1.3.6.1.4.1.32722.99.99.239341353911714368772597187099978969331"
```
{% endcode %}

If you want to download the files corresponding to the cohort from AWS instead of GCP, substitute aws`_url` for gc`s_url` in the `SELECT` statement of the query, such as in the following SELECT clause:
If you want to download the files corresponding to the cohort from GCP instead of AWS, substitute `series_aws_url` for `series_gcp_url` in the `SELECT` statement of the query, such as in the following SELECT clause:

{% code overflow="wrap" %}
```sql
SELECT DISTINCT(CONCAT("cp s3://", SPLIT(aws_url,"/")[SAFE_OFFSET(2)], "/", crdc_series_uuid, "/* ."))
SELECT DISTINCT(CONCAT(series_gcp_url, "* ."))
```
{% endcode %}

Expand Down Expand Up @@ -107,7 +107,7 @@ WHERE collection_id = "nsclc_radiomics"

[`s5cmd`](https://github.com/peak/s5cmd) is a very fast S3 and local filesystem execution tool that can be used for accessing IDC buckets and downloading files both from GCS and AWS.

Install `s5cmd` following the instructions in [https://github.com/peak/s5cmd#installation](https://github.com/peak/s5cmd#installation).
Install `s5cmd` following the instructions in [https://github.com/peak/s5cmd#installation](https://github.com/peak/s5cmd#installation), or if you have Python pip on you system you can just do `pip install s5cmd --upgrade`.

You can verify if your setup was successful by running the following command: it should successfully download one file from IDC.

Expand Down

0 comments on commit bdb3712

Please sign in to comment.