Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist repository properties in database #4179

Closed
JAORMX opened this issue Aug 16, 2024 · 0 comments · Fixed by #4299
Closed

Persist repository properties in database #4179

JAORMX opened this issue Aug 16, 2024 · 0 comments · Fixed by #4299
Assignees
Labels
enhancement New feature or request

Comments

@JAORMX
Copy link
Contributor

JAORMX commented Aug 16, 2024

No description provided.

@JAORMX JAORMX added the enhancement New feature or request label Aug 16, 2024
jhrozek added a commit to jhrozek/minder that referenced this issue Aug 29, 2024
Adds a utility database function to search entities by ID. Because we
store entity properties as JSON, we add a GIN index.

Testing with 50.000 entities we get the following performance:
```
Nested Loop  (cost=30.39..109.45 rows=5 width=87) (actual time=0.290..0.292 rows=1 loops=1)
  ->  Bitmap Heap Scan on properties p  (cost=30.10..67.90 rows=5 width=16) (actual time=0.216..0.218 rows=1 loops=1)
"        Recheck Cond: (value @> '{""value"": ""MBymWMNbcv"", ""version"": ""v1""}'::jsonb)"
        Filter: (key = 'upstream_id'::text)
        Heap Blocks: exact=1
        ->  Bitmap Index Scan on idx_properties_value_gin  (cost=0.00..30.10 rows=10 width=0) (actual time=0.174..0.175 rows=1 loops=1)
"              Index Cond: (value @> '{""value"": ""MBymWMNbcv"", ""version"": ""v1""}'::jsonb)"
  ->  Index Scan using entity_instances_pkey on entity_instances ei  (cost=0.29..8.31 rows=1 width=87) (actual time=0.069..0.069 rows=1 loops=1)
        Index Cond: (id = p.entity_id)
        Filter: (entity_type = 'repository'::entities)
Planning Time: 2.365 ms
Execution Time: 0.590 ms
```

Compared to not using the index:
```
Nested Loop  (cost=0.29..3165.63 rows=5 width=87) (actual time=0.117..42.194 rows=1 loops=1)
  ->  Seq Scan on properties p  (cost=0.00..3124.07 rows=5 width=16) (actual time=0.032..42.108 rows=1 loops=1)
"        Filter: ((value @> '{""value"": ""MBymWMNbcv"", ""version"": ""v1""}'::jsonb) AND (key = 'upstream_id'::text))"
        Rows Removed by Filter: 100004
  ->  Index Scan using entity_instances_pkey on entity_instances ei  (cost=0.29..8.31 rows=1 width=87) (actual time=0.082..0.082 rows=1 loops=1)
        Index Cond: (id = p.entity_id)
        Filter: (entity_type = 'repository'::entities)
Planning Time: 1.104 ms
Execution Time: 42.300 ms
```

Related: mindersec#4179
@jhrozek jhrozek mentioned this issue Aug 29, 2024
10 tasks
JAORMX pushed a commit that referenced this issue Aug 29, 2024
Adds a utility database function to search entities by ID. Because we
store entity properties as JSON, we add a GIN index.

Testing with 50.000 entities we get the following performance:
```
Nested Loop  (cost=30.39..109.45 rows=5 width=87) (actual time=0.290..0.292 rows=1 loops=1)
  ->  Bitmap Heap Scan on properties p  (cost=30.10..67.90 rows=5 width=16) (actual time=0.216..0.218 rows=1 loops=1)
"        Recheck Cond: (value @> '{""value"": ""MBymWMNbcv"", ""version"": ""v1""}'::jsonb)"
        Filter: (key = 'upstream_id'::text)
        Heap Blocks: exact=1
        ->  Bitmap Index Scan on idx_properties_value_gin  (cost=0.00..30.10 rows=10 width=0) (actual time=0.174..0.175 rows=1 loops=1)
"              Index Cond: (value @> '{""value"": ""MBymWMNbcv"", ""version"": ""v1""}'::jsonb)"
  ->  Index Scan using entity_instances_pkey on entity_instances ei  (cost=0.29..8.31 rows=1 width=87) (actual time=0.069..0.069 rows=1 loops=1)
        Index Cond: (id = p.entity_id)
        Filter: (entity_type = 'repository'::entities)
Planning Time: 2.365 ms
Execution Time: 0.590 ms
```

Compared to not using the index:
```
Nested Loop  (cost=0.29..3165.63 rows=5 width=87) (actual time=0.117..42.194 rows=1 loops=1)
  ->  Seq Scan on properties p  (cost=0.00..3124.07 rows=5 width=16) (actual time=0.032..42.108 rows=1 loops=1)
"        Filter: ((value @> '{""value"": ""MBymWMNbcv"", ""version"": ""v1""}'::jsonb) AND (key = 'upstream_id'::text))"
        Rows Removed by Filter: 100004
  ->  Index Scan using entity_instances_pkey on entity_instances ei  (cost=0.29..8.31 rows=1 width=87) (actual time=0.082..0.082 rows=1 loops=1)
        Index Cond: (id = p.entity_id)
        Filter: (entity_type = 'repository'::entities)
Planning Time: 1.104 ms
Execution Time: 42.300 ms
```

Related: #4179
jhrozek added a commit to jhrozek/minder that referenced this issue Aug 29, 2024
jhrozek added a commit to jhrozek/minder that referenced this issue Aug 29, 2024
jhrozek added a commit that referenced this issue Aug 29, 2024
…4299)

* Accept transaction from outside in the property service

If we had called these methods in a transaction, we would have
effectivelly created another parallel transaction which might cause
entities to not be referenced correctly.

* Add FilteredCopy and Merge to properties

These will be useful later.

* Add EntityInstance model

This is just a shortcut to avoid passing the properties and project ID
and entity etc around

* Decouple entities from fetching properties

This makes it easier to handle multiple entities inside the github
provider as all that is needed is to implement the PropertyFetcher
interface. It also makes moves the logic around formatting the name and
what properties are operational into the provider to code that is
per-entity.

* Handle operational attributes

minder saves attributes like hook ID which do not come from the upstream
entity. We need to avoid deleting them on updates.

* Add a method to refresh a repository + lazy migration into the github repository service

Related: #4179

* Switch over webhook handling to use property entities

Changes fetchRepo to refresh the repository being processed and return
an EntityInstance model structure instead of a database repo

Related: #4173

* Propagate 404 from the property wrappers as a special error code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants