Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get entity by attribute #4311

Merged
merged 1 commit into from
Aug 29, 2024
Merged

Get entity by attribute #4311

merged 1 commit into from
Aug 29, 2024

Conversation

jhrozek
Copy link
Contributor

@jhrozek jhrozek commented Aug 29, 2024

Summary

Adds a utility database function to search entities by ID. Because we
store entity properties as JSON, we add a GIN index.

Testing with 50.000 entities we get the following performance:

Nested Loop  (cost=30.39..109.45 rows=5 width=87) (actual time=0.290..0.292 rows=1 loops=1)
  ->  Bitmap Heap Scan on properties p  (cost=30.10..67.90 rows=5 width=16) (actual time=0.216..0.218 rows=1 loops=1)
"        Recheck Cond: (value @> '{""value"": ""MBymWMNbcv"", ""version"": ""v1""}'::jsonb)"
        Filter: (key = 'upstream_id'::text)
        Heap Blocks: exact=1
        ->  Bitmap Index Scan on idx_properties_value_gin  (cost=0.00..30.10 rows=10 width=0) (actual time=0.174..0.175 rows=1 loops=1)
"              Index Cond: (value @> '{""value"": ""MBymWMNbcv"", ""version"": ""v1""}'::jsonb)"
  ->  Index Scan using entity_instances_pkey on entity_instances ei  (cost=0.29..8.31 rows=1 width=87) (actual time=0.069..0.069 rows=1 loops=1)
        Index Cond: (id = p.entity_id)
        Filter: (entity_type = 'repository'::entities)
Planning Time: 2.365 ms
Execution Time: 0.590 ms

Compared to not using the index:

Nested Loop  (cost=0.29..3165.63 rows=5 width=87) (actual time=0.117..42.194 rows=1 loops=1)
  ->  Seq Scan on properties p  (cost=0.00..3124.07 rows=5 width=16) (actual time=0.032..42.108 rows=1 loops=1)
"        Filter: ((value @> '{""value"": ""MBymWMNbcv"", ""version"": ""v1""}'::jsonb) AND (key = 'upstream_id'::text))"
        Rows Removed by Filter: 100004
  ->  Index Scan using entity_instances_pkey on entity_instances ei  (cost=0.29..8.31 rows=1 width=87) (actual time=0.082..0.082 rows=1 loops=1)
        Index Cond: (id = p.entity_id)
        Filter: (entity_type = 'repository'::entities)
Planning Time: 1.104 ms
Execution Time: 42.300 ms

Related: #4179

  • Bug fix (resolves an issue without affecting existing features)
  • Feature (adds new functionality without breaking changes)
  • Breaking change (may impact existing functionalities or require documentation updates)
  • Documentation (updates or additions to documentation)
  • Refactoring or test improvements (no bug fixes or new functionality)

Testing

make test + as part of a larger branch

Review Checklist:

  • Reviewed my own code for quality and clarity.
  • Added comments to complex or tricky code sections.
  • Updated any affected documentation.
  • Included tests that validate the fix or feature.
  • Checked that related changes are merged.

Adds a utility database function to search entities by ID. Because we
store entity properties as JSON, we add a GIN index.

Testing with 50.000 entities we get the following performance:
```
Nested Loop  (cost=30.39..109.45 rows=5 width=87) (actual time=0.290..0.292 rows=1 loops=1)
  ->  Bitmap Heap Scan on properties p  (cost=30.10..67.90 rows=5 width=16) (actual time=0.216..0.218 rows=1 loops=1)
"        Recheck Cond: (value @> '{""value"": ""MBymWMNbcv"", ""version"": ""v1""}'::jsonb)"
        Filter: (key = 'upstream_id'::text)
        Heap Blocks: exact=1
        ->  Bitmap Index Scan on idx_properties_value_gin  (cost=0.00..30.10 rows=10 width=0) (actual time=0.174..0.175 rows=1 loops=1)
"              Index Cond: (value @> '{""value"": ""MBymWMNbcv"", ""version"": ""v1""}'::jsonb)"
  ->  Index Scan using entity_instances_pkey on entity_instances ei  (cost=0.29..8.31 rows=1 width=87) (actual time=0.069..0.069 rows=1 loops=1)
        Index Cond: (id = p.entity_id)
        Filter: (entity_type = 'repository'::entities)
Planning Time: 2.365 ms
Execution Time: 0.590 ms
```

Compared to not using the index:
```
Nested Loop  (cost=0.29..3165.63 rows=5 width=87) (actual time=0.117..42.194 rows=1 loops=1)
  ->  Seq Scan on properties p  (cost=0.00..3124.07 rows=5 width=16) (actual time=0.032..42.108 rows=1 loops=1)
"        Filter: ((value @> '{""value"": ""MBymWMNbcv"", ""version"": ""v1""}'::jsonb) AND (key = 'upstream_id'::text))"
        Rows Removed by Filter: 100004
  ->  Index Scan using entity_instances_pkey on entity_instances ei  (cost=0.29..8.31 rows=1 width=87) (actual time=0.082..0.082 rows=1 loops=1)
        Index Cond: (id = p.entity_id)
        Filter: (entity_type = 'repository'::entities)
Planning Time: 1.104 ms
Execution Time: 42.300 ms
```

Related: mindersec#4179
@JAORMX JAORMX merged commit 4a80ebd into mindersec:main Aug 29, 2024
22 checks passed
@coveralls
Copy link

Coverage Status

coverage: 53.951% (+0.004%) from 53.947%
when pulling 1b69cd4 on jhrozek:prop_search_by_attr
into 8f63f4b on stacklok:main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants