diff --git a/data/data-release-notes.md b/data/data-release-notes.md index 46eabba..cc243d1 100644 --- a/data/data-release-notes.md +++ b/data/data-release-notes.md @@ -12,7 +12,64 @@ Please refer to the license and terms of use, which are defined in the `license_ ## v18 - April 2024 -New clinical metadata tables +### New radiology collections + +1. [Advanced-MRI-Breast-Lesions](https://doi.org/10.7937/C7X1-YN57) + +### New analysis results + +1. [RMS-Mutation-Prediction-Expert-Annotations](https://doi.org/10.5281/zenodo.10462857)\*\ + Collections analyzed: + 1. [RMS-Mutation-Prediction](https://doi.org/10.5281/zenodo.8225131) +2. [TotalSegmentator-CT-Segmentations](https://doi.org/10.5281/zenodo.8347011)\*\*\ + Collections analyzed: + 1. [NLST](https://doi.org/10.7937/TCIA.HMQ8-J677) + +### Revised radiology collections + +(starred collections are revised due to new or revised analysis results) + +1. [Breast-Cancer-Screening-DBT](https://doi.org/10.7937/E4WT-CD02) (revisions only to clinical data) +2. [NLST](https://doi.org/10.7937/TCIA.HMQ8-J677)\*\* + +### Revised pathology collections + +(starred collections are revised due to new or revised analysis results) + +1. [CPTAC-BRCA](https://doi.org/10.7937/TCIA.CAEM-YS80) (fix PatientAges > 090Y) +2. [CPTAC-COAD](https://doi.org/10.7937/TCIA.YZWQ-ZZ63) (fix PatientAges > 090Y) +3. [RMS-Mutation-Prediction](https://doi.org/10.5281/zenodo.8225131)\* + 1. Also added missing instance \ + SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.3459553143.523311062.1687086765943.9.0 + 2. Removed corrupted instances + 1. SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.2164023716.1899467316.1685791236516.37.0 + 2. SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.2411736851.773458418.1686038949651.37.0 + 3. SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.2411736851.773458418.16860389 +4. [TCGA-BLCA](https://doi.org/10.7937/K9/TCIA.2016.8LNG8XDR) (All TCGA revisions are to correct multiple manufacturer values within same series) +5. [TCGA-BRCA](https://doi.org/10.7937/K9/TCIA.2016.AB2NAZRP) +6. [TCGA-CHOL](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/studied-cancers/cholangiocarcinoma) +7. [TCGA-COAD](https://doi.org/10.7937/K9/TCIA.2016.HJJHBOXZ) +8. TCGA-DLBC (No description page) +9. [TCGA-ESCA](https://doi.org/10.7937/K9/TCIA.2016.VPTNRGFY) +10. [TCGA-HNSC](https://doi.org/10.7937/K9/TCIA.2016.LXKQ47MS) +11. [TCGA-KIRC](https://doi.org/10.7937/K9/TCIA.2016.V6PBVTDR) +12. [TCGA-KIRP](https://doi.org/10.7937/K9/TCIA.2016.ACWOGBEF) +13. [TCGA-LIHC](https://doi.org/10.7937/K9/TCIA.2016.IMMQW8UQ) +14. [TCGA-LUAD](https://doi.org/10.7937/K9/TCIA.2016.JGNIHEP5) +15. [TCGA-LUSC](https://doi.org/10.7937/K9/TCIA.2016.TYGKKFMQ) +16. [TCGA-PAAD](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/studied-cancers/pancreatic) +17. [TCGA-PRAD](https://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y) +18. [TCGA-READ](https://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU) +19. [TCGA-SARC](https://doi.org/10.7937/K9/TCIA.2016.CX6YLSUX) +20. [TCGA-SKCM](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/studied-cancers/melanoma-skin) +21. [TCGA-STAD](https://doi.org/10.7937/K9/TCIA.2016.GDHL9KIM) +22. [TCGA-TGCT](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/studied-cancers/testicular-germ-cell) +23. [TCGA-THCA](https://doi.org/10.7937/K9/TCIA.2016.9ZFRVF1B) +24. [TCGA-THYM](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/studied-cancers/thymoma) +25. [TCGA-UCEC](https://doi.org/10.7937/K9/TCIA.2016.GKJ0ZWAC) +26. [TCGA-UCS](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/studied-cancers/uterine-carcinosarcoma) + +### New clinical metadata tables 1. [acrin_nsclc_fdg_pet_bamf_lung_pet_ct_segmentation](https://portal.imaging.datacommons.cancer.gov/explore/filters/?collection\_id=acrin\_nsclc\_fdg\_pet) 2. [anti_pd_1_lung_bamf_lung_ct_segmentation](https://portal.imaging.datacommons.cancer.gov/explore/filters/?collection\_id=anti\_pd\_1\_lung) @@ -34,7 +91,9 @@ New clinical metadata tables 18. [tcga_lusc_bamf_lung_mr_segmentation](https://portal.imaging.datacommons.cancer.gov/explore/filters/?collection\_id=tcga\_lusc) +### Notes +The deprecated columns `tcia_api_collection_id` and `idc_webapp_collection_id` have been removed from the `auxiliary_metadata` table in the `idc_v18` BQ dataset. These columns were duplicates of columns `collection_name` and `collection_id` respectively. ## v17 - December 2023 diff --git a/data/downloading-data/README.md b/data/downloading-data/README.md index 8deaf68..b242c83 100644 --- a/data/downloading-data/README.md +++ b/data/downloading-data/README.md @@ -34,7 +34,7 @@ Once you installed the package with pip install idc-index, you can use it to exp You can also take a look at a short tutorial on using `idc-index` [here](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/labs/idc\_rsna2023.ipynb). ```shell-session -pip install idc-index +pip install idc-index --upgrade ``` ```python diff --git a/data/downloading-data/downloading-data-with-s5cmd.md b/data/downloading-data/downloading-data-with-s5cmd.md index d428824..4291cd1 100644 --- a/data/downloading-data/downloading-data-with-s5cmd.md +++ b/data/downloading-data/downloading-data-with-s5cmd.md @@ -31,8 +31,8 @@ Queries below demonstrate how to get the Google Storage URLs to download cohort {% code overflow="wrap" %} ```sql -# Select all files from GCS for a given PatientID -SELECT DISTINCT(CONCAT("cp s3://", SPLIT(gcs_url,"/")[SAFE_OFFSET(2)], "/", crdc_series_uuid, "/* .")) +# Select all files for a given PatientID +SELECT DISTINCT(CONCAT(series_aws_url, "* .")) FROM `bigquery-public-data.idc_current.dicom_all` WHERE PatientID = "LUNG1-001" ``` @@ -40,8 +40,8 @@ WHERE PatientID = "LUNG1-001" {% code overflow="wrap" %} ```sql -# Select all files from GCS for a given collection -SELECT DISTINCT(CONCAT("cp s3://", SPLIT(gcs_url,"/")[SAFE_OFFSET(2)], "/", crdc_series_uuid, "/* .")) +# Select all files for a given collection +SELECT DISTINCT(CONCAT(series_aws_url, "* .")) FROM `bigquery-public-data.idc_current.dicom_all` WHERE collection_id = "nsclc_radiomics" ``` @@ -49,8 +49,8 @@ WHERE collection_id = "nsclc_radiomics" {% code overflow="wrap" %} ```sql -# Select all files from GCS for a given DICOM series -SELECT DISTINCT(CONCAT("cp s3://", SPLIT(gcs_url,"/")[SAFE_OFFSET(2)], "/", crdc_series_uuid, "/* .")) +# Select all files for a given DICOM series +SELECT DISTINCT(CONCAT(series_aws_url, "* .")) FROM `bigquery-public-data.idc_current.dicom_all` WHERE SeriesInstanceUID = "1.3.6.1.4.1.32722.99.99.298991776521342375010861296712563382046" ``` @@ -58,18 +58,18 @@ WHERE SeriesInstanceUID = "1.3.6.1.4.1.32722.99.99.29899177652134237501086129671 {% code overflow="wrap" %} ```sql -# Select all files from GCS for a given DICOM study -SELECT DISTINCT(CONCAT("cp s3://", SPLIT(gcs_url,"/")[SAFE_OFFSET(2)], "/", crdc_series_uuid, "/* .")) +# Select all files for a given DICOM study +SELECT DISTINCT(CONCAT(series_aws_url, "* .")) FROM `bigquery-public-data.idc_current.dicom_all` WHERE StudyInstanceUID = "1.3.6.1.4.1.32722.99.99.239341353911714368772597187099978969331" ``` {% endcode %} -If you want to download the files corresponding to the cohort from AWS instead of GCP, substitute aws`_url` for gc`s_url` in the `SELECT` statement of the query, such as in the following SELECT clause: +If you want to download the files corresponding to the cohort from GCP instead of AWS, substitute `series_aws_url` for `series_gcp_url` in the `SELECT` statement of the query, such as in the following SELECT clause: {% code overflow="wrap" %} ```sql -SELECT DISTINCT(CONCAT("cp s3://", SPLIT(aws_url,"/")[SAFE_OFFSET(2)], "/", crdc_series_uuid, "/* .")) +SELECT DISTINCT(CONCAT(series_gcp_url, "* .")) ``` {% endcode %} @@ -107,7 +107,7 @@ WHERE collection_id = "nsclc_radiomics" [`s5cmd`](https://github.com/peak/s5cmd) is a very fast S3 and local filesystem execution tool that can be used for accessing IDC buckets and downloading files both from GCS and AWS. -Install `s5cmd` following the instructions in [https://github.com/peak/s5cmd#installation](https://github.com/peak/s5cmd#installation). +Install `s5cmd` following the instructions in [https://github.com/peak/s5cmd#installation](https://github.com/peak/s5cmd#installation), or if you have Python pip on you system you can just do `pip install s5cmd --upgrade`. You can verify if your setup was successful by running the following command: it should successfully download one file from IDC.