Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More sort and finalize fixes #1799

Open
wants to merge 72 commits into
base: master
Choose a base branch
from

Conversation

vasil-pashov
Copy link
Collaborator

@vasil-pashov vasil-pashov commented Aug 29, 2024

Reference Issues/PRs

Fixes #1738
Fixes #1781
Fixes #1466
Fixes #1795
Fixes #1797
Fixes #1807
Fixes #1828

A notable change is that staged writes no longer validate the index is sorted. The validation is done at the moment compact_incompletes/finalize_staged_data/sort_and_finalize_staged_data is called. This is because sort_and_finalize_staged_data does not require the segments to be sorted, but the call for adding a staged segment is the same. We should add a separate call for that.

Note also that all incomplete keys for a symbol are deleted if any of the finalize calls fail. The other option is to leave the segments. In that case the user will have the responsibility of calling delete_staged_data.

What does this implement or fix?

Any other comments?

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

Vasil Pashov and others added 22 commits August 14, 2024 11:57
#### Reference Issues/PRs
Fixes:  #1753

#### What does this implement or fix?
Both `finalize_staged_data` and `sort_and_finalize_staged_data` now
return `VersionedItem`.

`metadata` parameter was added to `sort_and_finalize_staged_data`
#### Any other comments?

#### Checklist

<details>
  <summary>
   Checklist for code changes...
  </summary>
 
- [ ] Have you updated the relevant docstrings, documentation and
copyright notice?
- [ ] Is this contribution tested against [all ArcticDB's
features](../docs/mkdocs/docs/technical/contributing.md)?
- [ ] Do all exceptions introduced raise appropriate [error
messages](https://docs.arcticdb.io/error_messages/)?
 - [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in
autogenerated release notes?
</details>

<!--
Thanks for contributing a Pull Request to ArcticDB! Please ensure you
have taken a look at:
- ArcticDB's Code of Conduct:
https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md
- ArcticDB's Contribution Licensing:
https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing
-->

---------

Co-authored-by: Vasil Pashov <[email protected]>
…ize_staged_data instead of sort_and_finalize_staged_data
@vasil-pashov vasil-pashov marked this pull request as ready for review August 30, 2024 20:30
@vasil-pashov vasil-pashov marked this pull request as draft September 2, 2024 20:14
Vasil Pashov added 2 commits September 3, 2024 00:53
… but

different type
* Update the tests to reflect how Arctic works with Pandas 1
cpp/arcticdb/stream/merge.hpp Outdated Show resolved Hide resolved
cpp/arcticdb/version/version_core.cpp Show resolved Hide resolved
cpp/arcticdb/version/version_core.cpp Outdated Show resolved Hide resolved
python/tests/hypothesis/arcticdb/test_sort_merge.py Outdated Show resolved Hide resolved
python/tests/hypothesis/arcticdb/test_sort_merge.py Outdated Show resolved Hide resolved

ColumnInfo = namedtuple('ColumnInfo', ['name', 'dtype'])

COLUMN_DESCRIPTIONS = [ColumnInfo("a", "float"), ColumnInfo("b", "int64"), ColumnInfo("c", "str"), ColumnInfo("d", "datetime64[ns]")]
COLUMNS = [f"col_{i}" for i in range(0, 5)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to have an unsigned type and a few more columns? Five is very narrow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any number on your mind? The only concern is that generating too many columns might slow the tests but we can play with it until we're happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment