Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mechanism for storing ingested tabular data files WITHOUT the variable name header #8524

Closed
landreev opened this issue Mar 22, 2022 · 0 comments · Fixed by #10282
Closed
Assignees
Labels
Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Milestone

Comments

@landreev
Copy link
Contributor

As currently implemented, the tabular file in storage only has the data rows. And the header is added in real time, every time the file is downloaded; generated from the DataVariable objects in the database. There was some ancient legacy reason for this implementation. It's not clear if it's still relevant; and it would simplify the download framework a whole lot.

Potential drawbacks:

  1. Would require a migration process, to add the header to any existing tab files.
  2. If we ever add a mechanism for changing the names of data variables, that stored file would need to be updated (next time the dataset is published, for example?) to reflect the changes in the header.
@cmbz cmbz moved this to Release 6.1 Proposals in IQSS Dataverse Project Oct 11, 2023
@cmbz cmbz moved this to This Sprint 🏃‍♀️ 🏃 in IQSS Dataverse Project Jan 17, 2024
@landreev landreev added the Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) label Jan 17, 2024
@landreev landreev self-assigned this Jan 17, 2024
@landreev landreev moved this from This Sprint 🏃‍♀️ 🏃 to In Progress 💻 in IQSS Dataverse Project Jan 23, 2024
landreev added a commit that referenced this issue Jan 25, 2024
landreev added a commit that referenced this issue Jan 29, 2024
landreev added a commit that referenced this issue Jan 29, 2024
landreev added a commit that referenced this issue Jan 30, 2024
@landreev landreev changed the title Stop storing ingested tabular data files WITHOUT the variable name header Add mechanism for storing ingested tabular data files WITHOUT the variable name header Jan 30, 2024
landreev added a commit that referenced this issue Jan 31, 2024
landreev added a commit that referenced this issue Jan 31, 2024
landreev added a commit that referenced this issue Jan 31, 2024
landreev added a commit that referenced this issue Jan 31, 2024
…o the list of settings documented in the config guide while I was at it. #8524
landreev added a commit that referenced this issue Jan 31, 2024
… etc. git history can be consulted if anyone is curious about what we used to do here. #8524
stevenwinship pushed a commit that referenced this issue Feb 7, 2024
…10282)

* "stored with header" flag #8524

* more changes for the streaming and redirect code. #8524

* disabling dynamically-generated varheader in the remaining storage drivers. #8524

* Ingest plugins (work in progress) #8524

* R ingest plugin (#8524)

* still some unaddressed @todo:s, but the branch should build and the unit tests should be passing. # 8524

* work-in-progress, on the subsetting code in the download instance writer. #8524

* more work-in-progress changes. removing all the unused code from TabularSubsetGenerator, for clarity etc. #8524

* more bits and pieces #8524

* 2 more ingest plugins. #8542

* Integration tests. #8524

* typo #8524

* documenting the new setting. #8524

* a release note for the pr. also, added the "storage quotas enabled" to the list of settings documented in the config guide while I was at it. #8524

* removed all the unused code from this class (lots of it) for clarity, etc. git history can be consulted if anyone is curious about what we used to do here. #8524

* removing @todo: that's no longer relevant #8524

* (cosmetic) defined the control constants used in the integration test. #8524
@pdurbin pdurbin added this to the 6.2 milestone Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants