-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More sort and finalize fixes #1799
base: master
Are you sure you want to change the base?
Conversation
#### Reference Issues/PRs Fixes: #1753 #### What does this implement or fix? Both `finalize_staged_data` and `sort_and_finalize_staged_data` now return `VersionedItem`. `metadata` parameter was added to `sort_and_finalize_staged_data` #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> --------- Co-authored-by: Vasil Pashov <[email protected]>
…ize_staged_data instead of sort_and_finalize_staged_data
… but different type * Update the tests to reflect how Arctic works with Pandas 1
…ws_with_promotoable_types -> test_type_mismatch_in_staged_segments_throws_with_non_promotoable_types
3586eac
to
aae39b2
Compare
…avoid duplication. Fix typo
|
||
ColumnInfo = namedtuple('ColumnInfo', ['name', 'dtype']) | ||
|
||
COLUMN_DESCRIPTIONS = [ColumnInfo("a", "float"), ColumnInfo("b", "int64"), ColumnInfo("c", "str"), ColumnInfo("d", "datetime64[ns]")] | ||
COLUMNS = [f"col_{i}" for i in range(0, 5)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to have an unsigned type and a few more columns? Five is very narrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any number on your mind? The only concern is that generating too many columns might slow the tests but we can play with it until we're happy.
Reference Issues/PRs
Fixes #1738
Fixes #1781
Fixes #1466
Fixes #1795
Fixes #1797
Fixes #1807
Fixes #1828
A notable change is that staged writes no longer validate the index is sorted. The validation is done at the moment compact_incompletes/finalize_staged_data/sort_and_finalize_staged_data is called. This is because sort_and_finalize_staged_data does not require the segments to be sorted, but the call for adding a staged segment is the same. We should add a separate call for that.
Note also that all incomplete keys for a symbol are deleted if any of the finalize calls fail. The other option is to leave the segments. In that case the user will have the responsibility of calling
delete_staged_data
.What does this implement or fix?
Any other comments?
Checklist
Checklist for code changes...