Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name external store files with primary key when downloaded #1099

Open
ghost opened this issue Jul 14, 2023 · 1 comment
Open

Name external store files with primary key when downloaded #1099

ghost opened this issue Jul 14, 2023 · 1 comment
Assignees

Comments

@ghost
Copy link

ghost commented Jul 14, 2023

Feature Request

Problem

In many pipelines, external store files may have identical names. For example, an experimenter may name all their raw electrophysiology data 'data.bin'. When fetching this data, it downloads the external files with just their name into the working directory. This leads to overwriting of all files with identical names.

Note I am referring to the name of the file as it is stored on the local system before inserting and after fetching, not the name of the file in the store itself which is always unique as it is a hash.

Requirements

Provide an option in fetch to include the primary key for that entry in the downloaded file name. Instead of 'data.bin' the downloaded file from the store will be 'PRIMARY-KEY-data.bin'.

Justification

This will allow user to fetch data and download files from the external store that have identical file names.

Alternative Considerations

The alternative would be to force users to name all files uniquely. This is not helpful for some use cases. For example, in an electrophysiology pipeline, raw output may always be named 'data.bin' by the equipment, and the user may then directly upload this to their datajoint pipeline. It would be inconvenient to have to rename these files first.

Screenshots

In this screenshot, I show the result of a fetch on my database. Note that our equipment always names our raw electrophysiology data 'data.bin', but these are different files, which were stored in different directories before being uploaded to our datajoint pipeline. Here, they get overwritten, and my downloads folder only has one 'data.bin' file.

Screenshot from 2023-07-14 13-48-57

Additional Research and Context

Note i have only tested this with an S3 external store not a file store.

@ghost ghost added the enhancement label Jul 14, 2023
@dimitri-yatsenko dimitri-yatsenko self-assigned this Jul 17, 2023
@horsto
Copy link
Contributor

horsto commented Sep 21, 2023

I ran into the same inconvenience and I am renaming all files before upload to include the complete primary key. I agree though that this can be inconvenient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants