More sort and finalize fixes #1799

vasil-pashov · 2024-08-29T15:54:12Z

Reference Issues/PRs

Fixes #1738
Fixes #1781
Fixes #1466
Fixes #1795
Fixes #1797
Fixes #1807
Fixes #1828

A notable change is that staged writes no longer validate the index is sorted. The validation is done at the moment compact_incompletes/finalize_staged_data/sort_and_finalize_staged_data is called. This is because sort_and_finalize_staged_data does not require the segments to be sorted, but the call for adding a staged segment is the same. We should add a separate call for that.

Note also that all incomplete keys for a symbol are deleted if any of the finalize calls fail. The other option is to leave the segments. In that case the user will have the responsibility of calling delete_staged_data.

What does this implement or fix?

Any other comments?

Checklist

Checklist for code changes...

Have you updated the relevant docstrings, documentation and copyright notice?
Is this contribution tested against all ArcticDB's features?
Do all exceptions introduced raise appropriate error messages?
Are API changes highlighted in the PR description?
Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

#### Reference Issues/PRs Fixes: #1753 #### What does this implement or fix? Both `finalize_staged_data` and `sort_and_finalize_staged_data` now return `VersionedItem`. `metadata` parameter was added to `sort_and_finalize_staged_data` #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details>  --------- Co-authored-by: Vasil Pashov <[email protected]>

…ize_staged_data instead of sort_and_finalize_staged_data

… but different type * Update the tests to reflect how Arctic works with Pandas 1

…ws_with_promotoable_types -> test_type_mismatch_in_staged_segments_throws_with_non_promotoable_types

…Append

…ppend

python/tests/unit/arcticdb/version_store/test_sort_merge.py

cpp/arcticdb/stream/merge.hpp

cpp/arcticdb/version/version_core.cpp

python/tests/hypothesis/arcticdb/test_sort_merge.py

…avoid duplication. Fix typo

default values.

…e sort

willdealtry · 2024-09-19T11:19:06Z

python/tests/hypothesis/arcticdb/test_sort_merge.py


 ColumnInfo = namedtuple('ColumnInfo', ['name', 'dtype'])

-COLUMN_DESCRIPTIONS = [ColumnInfo("a", "float"), ColumnInfo("b", "int64"), ColumnInfo("c", "str"), ColumnInfo("d", "datetime64[ns]")]
+COLUMNS = [f"col_{i}" for i in range(0, 5)]


is it possible to have an unsigned type and a few more columns? Five is very narrow.

Do you have any number on your mind? The only concern is that generating too many columns might slow the tests but we can play with it until we're happy.

Vasil Pashov and others added 22 commits August 14, 2024 11:57

Fix having empty dataframe in staged writes

65d151a

Throw exception when there is an empty staging area

ddf9a96

Fix failing test

2bede7c

Fix compilation errors

07c2dd1

Apply fixes for empty dfs in staged writes with vanila finalize

19250a1

Address review comments

3f85d61

Add comments for unreachable code as per review request

8db2b78

Add comments as per review request

ff546ee

Throw when trying to compact unordered incomplete segments with final…

39baaf2

…ize_staged_data instead of sort_and_finalize_staged_data

More fixes for sort_and_finalize

583ef39

Fixes for sort and finalize

546a860

Add tests for schema mismatch in finalize_staged_data

585cb07

Merge branch 'master' into sort-and-finalize-sorting

4da043a

Fix errors from merge commit

e5fdb1c

Fix CI compilation error

e621b9f

Fix compilation errors in tests

2aa8497

Fix failing C++ tests

9ec7fca

Fix compilation errors

f672906

Merge branch 'master' into sort-and-finalize-sorting

4f0d36b

Fix failing c++ tests

6b09f3c

Add fixes for dynamic schema and staged writes

91cca50

vasil-pashov marked this pull request as ready for review August 30, 2024 20:30

vasil-pashov requested review from alexowens90, willdealtry and poodlewars as code owners August 30, 2024 20:30

Fixing failing tests

3405b3a

vasil-pashov marked this pull request as draft September 2, 2024 20:14

Vasil Pashov added 2 commits September 3, 2024 00:53

* Make it possible to have columns in the staged segments with common…

32bfcc7

… but different type * Update the tests to reflect how Arctic works with Pandas 1

Fix C++ tests

2462d1f

Vasil Pashov added 11 commits September 10, 2024 12:02

Check values in test_repeating_index_values

2a09829

Fix duplicated col_1 in test_appending_reordered_column_set_throws

14ab952

Fix typo in test case name test_type_mismatch_in_staged_segments_thro…

819d2a5

…ws_with_promotoable_types -> test_type_mismatch_in_staged_segments_throws_with_non_promotoable_types

Fix duplicated col_1 in test_staged_segments_cant_be_reordered

ddb83a6

Explicit dtypes for all DFs in TestStreamDescriptorMismatchOnFinalize…

f7c7746

…Append

Move test_two_columns_with_different_dtypes to nonreg tests

0336091

More test cases for NaT values

bff0fdd

Type promotion checks for sort merge and dynamic schema

7b4ae50

Move v1 TestFinalizeStagedDataStaticSchemaMismatch to test_parallel.py

1d7e76d

Move TestFinalizeWithEmptySegments for v1 API in test_parallel.py

495ec6d

Remove has_common_valid_type from merge function

aae39b2

vasil-pashov force-pushed the more-sort-and-finalize-fixes branch from 3586eac to aae39b2 Compare September 11, 2024 16:23

Vasil Pashov added 7 commits September 11, 2024 20:56

Simplify merge types checks ad fix date_and_time

215cd3b

Rework hypothesis tests for sort and finalize. TBD: dynnamic schema a…

98aef2b

…ppend

Rework dynamic schema sort and finalize append tests

5b98440

Fix hypothesis tests

5ef9514

Fix merge clause test

beac0bb

Fix a typo

d30124c

Allow for UNKNOWN sorted value in do_compact

575c07e

alexowens90 requested changes Sep 17, 2024

View reviewed changes

Vasil Pashov added 6 commits September 17, 2024 12:41

Move no append keys assertion in the funcitons that assert errors to …

5edea0e

…avoid duplication. Fix typo

Use ScalarTypeInfo

8cd0ac3

Fix test name

d76e0e3

Fix compilation errors on linux

82cc9e9

Remove empty columns from staged segments instead of filling them with

006f78a

default values.

Do not try dropping empty colummns when static schema is used in merg…

cef88e4

…e sort

vasil-pashov mentioned this pull request Sep 19, 2024

Add clear keys parameter for finalize and sort and finalize staged data methods controlling what happens if an exception is thrown #1839

Open

willdealtry reviewed Sep 19, 2024

View reviewed changes

Merge branch 'master' into more-sort-and-finalize-fixes

8999834

alexowens90 approved these changes Sep 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More sort and finalize fixes #1799

More sort and finalize fixes #1799

vasil-pashov commented Aug 29, 2024 •

edited

Loading

willdealtry Sep 19, 2024

vasil-pashov Sep 19, 2024

More sort and finalize fixes #1799

Are you sure you want to change the base?

More sort and finalize fixes #1799

Conversation

vasil-pashov commented Aug 29, 2024 • edited Loading

Reference Issues/PRs

What does this implement or fix?

Any other comments?

Checklist

willdealtry Sep 19, 2024

Choose a reason for hiding this comment

vasil-pashov Sep 19, 2024

Choose a reason for hiding this comment

vasil-pashov commented Aug 29, 2024 •

edited

Loading