-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for download of single instances instead of whole series #97
Comments
Follow up on #15 and ImagingDataCommons/idc-index#97 This adds selected attributes that vary across individual instances within the series, tying them to the instances via SOPInstanceUID. Series-level attributes (including storage bucket folder containing the series) are expected to be accessible by JOIN on the included SeriesInstanceUID.
* enh: add instance-level SM index Follow up on #15 and ImagingDataCommons/idc-index#97 This adds selected attributes that vary across individual instances within the series, tying them to the instances via SOPInstanceUID. Series-level attributes (including storage bucket folder containing the series) are expected to be accessible by JOIN on the included SeriesInstanceUID. * enh: added instance file size
Proposal for implementation in case the instance-level index file is added automatically in the same way as the series-level index during IDCClient instantiation:
Proposal for implementation in case we don't add the instance-level index automatically.
Let me know what you think or if I've missed something. |
@DanielaSchacherer The pathology client should be able to do everything that idc_index already does, simply by doing a join on series level index. Yes, we will need to rewrite all endpoints available in idc_index again but it should not be hard. After that we can have |
I like that idea, Vamsi. |
Sorry I missed this discussion earlier. I would strongly prefer to not introduce additional class for pathology. Going forward, we will only have more of the subject-specific indices (clinical data, segmentations, measurements, CT, MR, RTSTRUCT), so the complexity will only increase, and often we will need to join multiple indices and I think it will make more sense to join them in the same class. Creation of new clients will also require unnecessary duplication of the main index. Also, I do not see any need to separate pathology-specific functionality. What is new is instance-level access, which may be justifiable in situations beyond pathology. Instead, I suggest (as I proposed and discussed earlier on several occasions) to allow download of additional indices by the user on request. I have no concern adding endpoints @DanielaSchacherer proposed. Whenever instance-level access is requested, we could search all of the indices that are available locally, identify those that contain Along those lines, we would just need to add the following endpoints to the existing
|
I don't have a good idea on how I'd implement this as there are too many variables. Please go ahead and take a swing if you can @DanielaSchacherer . |
if you need any help with unnesting with duckdb, I can certainly help..Here's one foolproof way of unnesting with duckdb. It is slow but I guarantee the accuracy of unnesting.
|
I don't have much time left to work on IDC this month, so progress might be slow, but I can certainly work on it! :) |
Started with this in #101. |
I am seeing those comments just now, I was away last week. I will review the PR from Daniela after it clears the CI checks and comment then. |
Here now my proposal (slightly adapted from the proposal in a previous comment) on how to proceed with the implementation of instance-level access for pathology slides:
|
Sounds good to me overall.
Would it make sense to join it when instance index is loaded to include the attributes needed for building the folder hierarchy? Or join every time we need those attributes when download is requested? |
It would make very much sense. I did that. Opened a new PR with some comments/things that we might need to discuss: #112 |
For pathology use cases, it is crucial to be able to download only single levels (and safe the effort of downloading a huge amount of data not actually needed).
I think this could be added in the context of adding the instance-level slide-microscopy index (sm_index).
The text was updated successfully, but these errors were encountered: