Add mechanism for storing ingested tabular data files WITHOUT the variable name header #8524

landreev · 2022-03-22T18:41:05Z

As currently implemented, the tabular file in storage only has the data rows. And the header is added in real time, every time the file is downloaded; generated from the DataVariable objects in the database. There was some ancient legacy reason for this implementation. It's not clear if it's still relevant; and it would simplify the download framework a whole lot.

Potential drawbacks:

Would require a migration process, to add the header to any existing tab files.
If we ever add a mechanism for changing the names of data variables, that stored file would need to be updated (next time the dataset is published, for example?) to reflect the changes in the header.

…ivers. #8524

…ter. #8524

…larSubsetGenerator, for clarity etc. #8524

…o the list of settings documented in the config guide while I was at it. #8524

… etc. git history can be consulted if anyone is curious about what we used to do here. #8524

…#8524

@todo

…10282) * "stored with header" flag #8524 * more changes for the streaming and redirect code. #8524 * disabling dynamically-generated varheader in the remaining storage drivers. #8524 * Ingest plugins (work in progress) #8524 * R ingest plugin (#8524) * still some unaddressed @todo:s, but the branch should build and the unit tests should be passing. # 8524 * work-in-progress, on the subsetting code in the download instance writer. #8524 * more work-in-progress changes. removing all the unused code from TabularSubsetGenerator, for clarity etc. #8524 * more bits and pieces #8524 * 2 more ingest plugins. #8542 * Integration tests. #8524 * typo #8524 * documenting the new setting. #8524 * a release note for the pr. also, added the "storage quotas enabled" to the list of settings documented in the config guide while I was at it. #8524 * removed all the unused code from this class (lots of it) for clarity, etc. git history can be consulted if anyone is curious about what we used to do here. #8524 * removing @todo: that's no longer relevant #8524 * (cosmetic) defined the control constants used in the integration test. #8524

This was referenced Mar 22, 2022

Refactor and rethink the tabular ingest subsystem in v6. #8526

Open

Tab files downloaded through Zipper do not include headers #8485

Open

This was referenced Feb 7, 2023

Dataverse content provider: download files in original format jupyterhub/repo2docker#1242

Closed

Fix Binder and Whole Tale (repo2docker) to download original files rather than archival .tab files #9374

Closed

cmbz added this to IQSS Dataverse Project Oct 11, 2023

cmbz moved this to Release 6.1 Proposals in IQSS Dataverse Project Oct 11, 2023

cmbz moved this to This Sprint 🏃‍♀️ 🏃 in IQSS Dataverse Project Jan 17, 2024

landreev added the Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) label Jan 17, 2024

landreev self-assigned this Jan 17, 2024

landreev moved this from This Sprint 🏃‍♀️ 🏃 to In Progress 💻 in IQSS Dataverse Project Jan 23, 2024

landreev added a commit that referenced this issue Jan 25, 2024

"stored with header" flag #8524

143d420

landreev added a commit that referenced this issue Jan 26, 2024

more changes for the streaming and redirect code. #8524

ef3bb04

landreev added a commit that referenced this issue Jan 26, 2024

disabling dynamically-generated varheader in the remaining storage dr…

5f9cda5

…ivers. #8524

landreev added a commit that referenced this issue Jan 26, 2024

Ingest plugins (work in progress) #8524

a6db2a9

landreev added a commit that referenced this issue Jan 29, 2024

R ingest plugin (#8524)

5463e17

landreev added a commit that referenced this issue Jan 29, 2024

work-in-progress, on the subsetting code in the download instance wri…

cfbfd19

…ter. #8524

landreev added a commit that referenced this issue Jan 29, 2024

more work-in-progress changes. removing all the unused code from Tabu…

c30ded3

…larSubsetGenerator, for clarity etc. #8524

landreev added a commit that referenced this issue Jan 30, 2024

more bits and pieces #8524

f22aa3e

landreev changed the title ~~Stop storing ingested tabular data files WITHOUT the variable name header~~ Add mechanism for storing ingested tabular data files WITHOUT the variable name header Jan 30, 2024

landreev added a commit that referenced this issue Jan 31, 2024

Integration tests. #8524

50b44b5

landreev mentioned this issue Jan 31, 2024

8524 adding mechanism for storing tab. files with variable headers #10282

Merged

landreev added a commit that referenced this issue Jan 31, 2024

typo #8524

3201eca

landreev removed this from IQSS Dataverse Project Jan 31, 2024

landreev added a commit that referenced this issue Jan 31, 2024

documenting the new setting. #8524

4ef2fca

landreev added a commit that referenced this issue Jan 31, 2024

a release note for the pr. also, added the "storage quotas enabled" t…

b81f96c

…o the list of settings documented in the config guide while I was at it. #8524

landreev added a commit that referenced this issue Jan 31, 2024

removed all the unused code from this class (lots of it) for clarity,…

4745301

… etc. git history can be consulted if anyone is curious about what we used to do here. #8524

landreev added a commit that referenced this issue Jan 31, 2024

removing @todo: that's no longer relevant #8524

3dbc8c5

landreev added a commit that referenced this issue Feb 1, 2024

(cosmetic) defined the control constants used in the integration test. …

e0dc198

…#8524

stevenwinship closed this as completed in #10282 Feb 7, 2024

pdurbin added this to the 6.2 milestone Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mechanism for storing ingested tabular data files WITHOUT the variable name header #8524

Add mechanism for storing ingested tabular data files WITHOUT the variable name header #8524

landreev commented Mar 22, 2022

Add mechanism for storing ingested tabular data files WITHOUT the variable name header #8524

Add mechanism for storing ingested tabular data files WITHOUT the variable name header #8524

Comments

landreev commented Mar 22, 2022