Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor 1833:use entt as ecs #1834

Merged
merged 1 commit into from
Oct 11, 2024
Merged

Conversation

alexowens90
Copy link
Collaborator

Reference Issues/PRs

Closes #1833

What does this implement or fix?

Introduces ENTT as the underlying datastore for entities in the ComponentManager

@alexowens90 alexowens90 changed the title Refactor/1833/use entt as ecs WIP: Refactor 1833:use entt as ecs Sep 13, 2024
@alexowens90 alexowens90 self-assigned this Sep 13, 2024
@alexowens90 alexowens90 marked this pull request as draft September 13, 2024 16:01
@alexowens90 alexowens90 changed the title WIP: Refactor 1833:use entt as ecs Refactor 1833:use entt as ecs Sep 23, 2024
@alexowens90 alexowens90 marked this pull request as ready for review September 23, 2024 08:47
processing/test/test_filter_and_project_sparse.cpp
processing/test/test_has_valid_type_promotion.cpp
processing/test/test_operation_dispatch.cpp
# async/test/test_async.cpp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orly?

@@ -272,14 +295,15 @@ struct PartitionClause {
if (entity_ids.empty()) {
return {};
}
auto proc = gather_entities(component_manager_, std::move(entity_ids));
auto proc = gather_entities<std::shared_ptr<SegmentInMemory>, std::shared_ptr<RowRange>, std::shared_ptr<ColRange>>(*component_manager_, std::move(entity_ids));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have shared_ptrs to SegmentInMemory, given that SegmentInMemory is already a shared_ptr?

@alexowens90 alexowens90 merged commit 75486dd into master Oct 11, 2024
116 checks passed
@alexowens90 alexowens90 deleted the refactor/1833/use-entt-as-ecs branch October 11, 2024 10:04
alexowens90 added a commit that referenced this pull request Oct 11, 2024
To be rebased after #1834 is merged

#### Reference Issues/PRs
Closes #1721 
Closes #245 

### Performance:
Benchmarked using 8 cores, with mimalloc preloaded, and lmdb as the
storage backend
Data of the form
```
                        tick type       bid       ask
2020-01-01 08:00:00.000       ASK       NaN  0.291217
2020-01-01 08:00:00.001       BID  0.271128       NaN
2020-01-01 08:00:00.002       ASK       NaN  0.664834
2020-01-01 08:00:00.003       ASK       NaN  0.098223
2020-01-01 08:00:00.004       BID  0.751502       NaN
```
i.e. `tick type` is a string column containing "BID" or "ASK" with equal
probability, and the `bid` and `ask` columns contain random floats
between 0 and 1 if the tick type matches the column name, or `NaN`
otherwise

- 1 tick every millisecond (60k ticks per minute) 
- 24m ticks per day (8 hours)
- 6B ticks per year (250 days)
- ~100GB on disk (randomness and `NaNs` compress poorly, raw data is
~179GB)

Performance (with default 100k rows per segment):

- Reading (6B is all data, 3B is with half the date range)
    - Reading 6B ticks took 28.9s
    - Reading 3B ticks took 13.3s
        - i.e. scales linearly in date range covered
- Filtering on `tick type` column to one of "BID" or "ASK"
    - Filtering 6B ticks took 42.7s
    - Filtering 3B ticks took 20.7s
- i.e. scales linearly in date range covered, ~50% slower than raw
reading time
- Resampling down to minute frequency, taking the max of the `bid`
column
    - Resampling 6B ticks to 100,000 mins took 19.s
    - Resampling 3B ticks to 50,000 mins took 9.7s
- i.e. scales linearly in date range covered, ~33% faster than raw
reading time
- Combination of filter and resample described above
    - Filtering then resampling 6B ticks to 100,000 mins took 39.1s
    - Filtering then resampling 3B ticks to 50,000 mins took 19.3s
- i.e. scales linearly in date range covered, ~40% slower than raw
reading time

Restructuring after the filter and before the filter takes ~100ms for 6B
ticks (i.e. 0.25% of the total time).
Tail latency introduced by the restructuring "stop the world" approach
is ~2ms in this example (time to filter one segment).

Everything ~10% faster with 1m rows per segment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use ENTT for ECS
2 participants