Refactor 1833:use entt as ecs #1834

alexowens90 · 2024-09-13T16:01:04Z

Reference Issues/PRs

What does this implement or fix?

Introduces ENTT as the underlying datastore for entities in the ComponentManager

willdealtry · 2024-09-23T08:49:31Z

cpp/arcticdb/CMakeLists.txt

-            processing/test/test_filter_and_project_sparse.cpp
-            processing/test/test_has_valid_type_promotion.cpp
-            processing/test/test_operation_dispatch.cpp
+#            async/test/test_async.cpp


willdealtry · 2024-09-23T12:02:46Z

cpp/arcticdb/processing/clause.hpp

@@ -272,14 +295,15 @@ struct PartitionClause {
        if (entity_ids.empty()) {
            return {};
        }
-        auto proc = gather_entities(component_manager_, std::move(entity_ids));
+        auto proc = gather_entities<std::shared_ptr<SegmentInMemory>, std::shared_ptr<RowRange>, std::shared_ptr<ColRange>>(*component_manager_, std::move(entity_ids));


Do we need to have shared_ptrs to SegmentInMemory, given that SegmentInMemory is already a shared_ptr?

To be rebased after #1834 is merged #### Reference Issues/PRs Closes #1721 Closes #245 ### Performance: Benchmarked using 8 cores, with mimalloc preloaded, and lmdb as the storage backend Data of the form ``` tick type bid ask 2020-01-01 08:00:00.000 ASK NaN 0.291217 2020-01-01 08:00:00.001 BID 0.271128 NaN 2020-01-01 08:00:00.002 ASK NaN 0.664834 2020-01-01 08:00:00.003 ASK NaN 0.098223 2020-01-01 08:00:00.004 BID 0.751502 NaN ``` i.e. `tick type` is a string column containing "BID" or "ASK" with equal probability, and the `bid` and `ask` columns contain random floats between 0 and 1 if the tick type matches the column name, or `NaN` otherwise - 1 tick every millisecond (60k ticks per minute) - 24m ticks per day (8 hours) - 6B ticks per year (250 days) - ~100GB on disk (randomness and `NaNs` compress poorly, raw data is ~179GB) Performance (with default 100k rows per segment): - Reading (6B is all data, 3B is with half the date range) - Reading 6B ticks took 28.9s - Reading 3B ticks took 13.3s - i.e. scales linearly in date range covered - Filtering on `tick type` column to one of "BID" or "ASK" - Filtering 6B ticks took 42.7s - Filtering 3B ticks took 20.7s - i.e. scales linearly in date range covered, ~50% slower than raw reading time - Resampling down to minute frequency, taking the max of the `bid` column - Resampling 6B ticks to 100,000 mins took 19.s - Resampling 3B ticks to 50,000 mins took 9.7s - i.e. scales linearly in date range covered, ~33% faster than raw reading time - Combination of filter and resample described above - Filtering then resampling 6B ticks to 100,000 mins took 39.1s - Filtering then resampling 3B ticks to 50,000 mins took 19.3s - i.e. scales linearly in date range covered, ~40% slower than raw reading time Restructuring after the filter and before the filter takes ~100ms for 6B ticks (i.e. 0.25% of the total time). Tail latency introduced by the restructuring "stop the world" approach is ~2ms in this example (time to filter one segment). Everything ~10% faster with 1m rows per segment

alexowens90 requested review from willdealtry and poodlewars as code owners September 13, 2024 16:01

alexowens90 changed the title ~~Refactor/1833/use entt as ecs~~ WIP: Refactor 1833:use entt as ecs Sep 13, 2024

alexowens90 self-assigned this Sep 13, 2024

alexowens90 marked this pull request as draft September 13, 2024 16:01

alexowens90 changed the title ~~WIP: Refactor 1833:use entt as ecs~~ Refactor 1833:use entt as ecs Sep 23, 2024

alexowens90 marked this pull request as ready for review September 23, 2024 08:47

alexowens90 force-pushed the refactor/1833/use-entt-as-ecs branch from 7cda181 to 6aafd6e Compare September 23, 2024 14:11

willdealtry approved these changes Sep 25, 2024

View reviewed changes

alexowens90 force-pushed the refactor/1833/use-entt-as-ecs branch from df39d7c to 70efab6 Compare September 26, 2024 16:32

alexowens90 mentioned this pull request Oct 2, 2024

Enhancement 1721: arbitrary clause ordering #1860

Merged

alexowens90 added the refactor label Oct 10, 2024

Refactor 1833: Use ENTT as the base storage inside the ComponentManager

2a7045a

alexowens90 force-pushed the refactor/1833/use-entt-as-ecs branch from 70efab6 to 2a7045a Compare October 10, 2024 13:22

alexowens90 merged commit 75486dd into master Oct 11, 2024
116 checks passed

alexowens90 deleted the refactor/1833/use-entt-as-ecs branch October 11, 2024 10:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor 1833:use entt as ecs #1834

Refactor 1833:use entt as ecs #1834

alexowens90 commented Sep 13, 2024

willdealtry Sep 23, 2024

willdealtry Sep 23, 2024

Refactor 1833:use entt as ecs #1834

Refactor 1833:use entt as ecs #1834

Conversation

alexowens90 commented Sep 13, 2024

Reference Issues/PRs

What does this implement or fix?

willdealtry Sep 23, 2024

Choose a reason for hiding this comment

willdealtry Sep 23, 2024

Choose a reason for hiding this comment