Invalidation Scope Fix (C4-854) #222

willronchetti · 2022-07-07T20:12:43Z

Repairs several important cases in invalidation scope by revising the core algorithm, which is now described in the filter_invalidation_scope docstring.
Should work correctly for object fields, links beyond depth 1 and *
Other small changes include repairing the test script and allowing indexer worker runs to re-use testapp for 100 iterations (thus preserving cache, probably speeding up indexing and reducing DB load)

netsettler · 2022-07-08T14:10:15Z

snovault/commands/es_index_data.py

@@ -19,7 +21,10 @@ def run(app, uuids=None):
    }
    if uuids:
        post_body['uuids'] = list(uuids)
-    testapp.post_json('/index', post_body)
+        testapp.post_json('/index', post_body)
+    else:


So previously we did 1 post and now we do 100. That's a big jump. Can you add a comment here about the chosen value and why (a) it's so much bigger and (b) it's smaller than infinity. How is the amount of work this does to be conceptualized?

netsettler · 2022-07-08T14:13:12Z

snovault/tests/conftest.py

@@ -45,6 +45,7 @@ def start_moto_server_sqs():
            os.environ['SQS_URL'] = 'http://localhost:3000'  # must exists globally because of MPIndexer
            server_args = ['poetry', 'run', 'moto_server', 'sqs', '-p3000']
            server = subprocess.Popen(server_args, stdout=server_output, stderr=server_output)
+            time.sleep(5)


I would make a variable

APPROXIMATE_MOTO_STARTUP_SECONDS = 5

and use that.

netsettler · 2022-07-08T14:14:29Z

snovault/tests/test_invalidation_scope.py

@@ -162,6 +162,7 @@ def test_parent_type_object_schema():
    }


+@pytest.mark.skip  # mocks are not setup to handle the new method


Maybe a more explicit TODO describing the specifics of what is missing?

netsettler · 2022-07-08T14:18:45Z

snovault/crud_views.py

+    if 'delete_fields' in request.params:
+        deleted_fields = request.params['delete_fields'].split(',')
+        for deleted_field in deleted_fields:
+            body[deleted_field] = ''


Is the empty string a better value than None? Is it that None would not work or is the empty string treated the same as missing anyway? I feel like '' is sometimes semantically meaningful in a non-None kind of way and I'm wondering how heuristic vs. algorithmic this technique is. I guess the doc string says it's intended for diff, in which case maybe it doesn't matter. I think I'd like this to have a leading _ in its name so its clear this function is used only for a specific need and/or it wouldn't hurt, too, if its name also had _for_diff at the end of it.

netsettler

We noted some things when we went through this interactively that need tending to.
If you need me to look again after you do this, please let me know.
And the testing could be beefed up.
But basically I'm OK with this and giving it provisional review.
I had a few other non-blocking comments.

netsettler · 2022-07-08T16:03:19Z

snovault/elasticsearch/indexer_utils.py

    """
-    skip, diffs, child_to_parent_type = build_diff_metadata(registry, diff)
+    skip, diffs, diff_type, child_to_parent_type = build_diff_metadata(registry, diff)
+    valid_diff_types = child_to_parent_type.get(diff_type, []) + [diff_type]
    # go through all invalidated uuids, looking at the embedded list of the item type
    item_type_is_invalidated = {}


Odd. I've had called this item_types_invalidated.

netsettler · 2022-07-08T16:04:24Z

snovault/elasticsearch/indexer_utils.py

    # go through all invalidated uuids, looking at the embedded list of the item type
    item_type_is_invalidated = {}
    for invalidated_uuid, invalidated_item_type in invalidated_with_type:
        if skip is True:  # if we detected a change to a default embed, invalidate everything

-            # if in debug mode, populate invalidation metadata at the expense of performance
-            if verbose:


Was this a bug? It looks like it was doing real work only in verbose mode. That seems like a recipe for disaster. Was it work that was previously only used for debugging?

netsettler · 2022-07-08T16:05:14Z

snovault/elasticsearch/indexer_utils.py

-                    item_type_is_invalidated[invalidated_item_type] = True
-                continue
-            else:  # in production, exit immediately if we see this, as this works by side-effect
-                break


Oh, hmm. This is because you were previously looking for the first item, not the last item?

netsettler · 2022-07-08T16:07:11Z

snovault/tests/test_invalidation_scope.py

+        invalidated = [(obj['@id'], obj['@type'][0]) for obj in groups]
+        secondary = {obj['@id'] for obj in groups}
+        with type_embedded_list_mock(embedded_list=['sources.sample_objects.notes']):
+            self.runtest(testapp, diff, invalidated, secondary, 1)


I think in the long run it's better to do keyword calls to stuff like this in case the arguments change. Easy for random data like this to get accepted and just do the wrong thing if the arguments are not connected right.

netsettler · 2022-07-11T16:28:19Z

snovault/elasticsearch/indexer_utils.py

+            # the last link target ends and the terminal field begins
+            for i in range(len(split_embed), 0, -1):
+                embed_path = '.'.join(split_embed[0:i])
+                if embed_path.endswith('*'):


This needs a comment saying whether we're ignoring this case or handling it elsewhere, or why it's OK to just skip out. Yes, one might be able to figure it out, but comments are useful to answer obvious questions without making one read code.

netsettler · 2022-07-11T17:19:50Z

snovault/elasticsearch/indexer_utils.py

    """
-    skip, diffs, child_to_parent_type = build_diff_metadata(registry, diff)
+    skip, diffs, diff_type, child_to_parent_type = build_diff_metadata(registry, diff)
+    valid_diff_types = child_to_parent_type.get(diff_type, []) + [diff_type]


We noted in interactive review that this may need more scrutiny.

willronchetti added 8 commits June 27, 2022 13:49

repair delete_fields interaction

1521b2f

implement backtracking method, skip unit tests with incompatible mocks

c490e71

potential fix

1ce5b9a

few more fixes

34158d2

check that the field matches terminal field on *

0c2ac4e

account for bare * correctly

b638209

document

d444297

clean up a little, give beta

56e86f9

willronchetti requested a review from netsettler July 7, 2022 20:12

Merge branch 'master' into scope_fix

ab908c0

netsettler reviewed Jul 8, 2022

View reviewed changes

netsettler approved these changes Jul 11, 2022

View reviewed changes

willronchetti added 2 commits July 12, 2022 15:05

merge master

a4bab09

add note schema

b8301d9

netsettler changed the title ~~Invalidation Scope Fix~~ Invalidation Scope Fix (C4-854) Jul 13, 2022

willronchetti added 2 commits July 14, 2022 10:19

fix some more cases

9de048f

add status

6e98151

willronchetti merged commit 5717fdb into master Jul 14, 2022

willronchetti deleted the scope_fix branch July 14, 2022 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalidation Scope Fix (C4-854) #222

Invalidation Scope Fix (C4-854) #222

willronchetti commented Jul 7, 2022 •

edited

Loading

netsettler Jul 8, 2022

netsettler Jul 8, 2022

netsettler Jul 8, 2022

netsettler Jul 8, 2022

netsettler left a comment

netsettler Jul 8, 2022

netsettler Jul 8, 2022

netsettler Jul 8, 2022

netsettler Jul 8, 2022

netsettler Jul 11, 2022

netsettler Jul 11, 2022

		@@ -162,6 +162,7 @@ def test_parent_type_object_schema():
		}


		@pytest.mark.skip # mocks are not setup to handle the new method

Invalidation Scope Fix (C4-854) #222

Invalidation Scope Fix (C4-854) #222

Conversation

willronchetti commented Jul 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netsettler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willronchetti commented Jul 7, 2022 •

edited

Loading