Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalidation Scope Fix (C4-854) #222

Merged
merged 13 commits into from
Jul 14, 2022
Merged

Invalidation Scope Fix (C4-854) #222

merged 13 commits into from
Jul 14, 2022

Conversation

willronchetti
Copy link
Member

@willronchetti willronchetti commented Jul 7, 2022

  • Repairs several important cases in invalidation scope by revising the core algorithm, which is now described in the filter_invalidation_scope docstring.
  • Should work correctly for object fields, links beyond depth 1 and *
  • Other small changes include repairing the test script and allowing indexer worker runs to re-use testapp for 100 iterations (thus preserving cache, probably speeding up indexing and reducing DB load)

@@ -19,7 +21,10 @@ def run(app, uuids=None):
}
if uuids:
post_body['uuids'] = list(uuids)
testapp.post_json('/index', post_body)
testapp.post_json('/index', post_body)
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So previously we did 1 post and now we do 100. That's a big jump. Can you add a comment here about the chosen value and why (a) it's so much bigger and (b) it's smaller than infinity. How is the amount of work this does to be conceptualized?

@@ -45,6 +45,7 @@ def start_moto_server_sqs():
os.environ['SQS_URL'] = 'http://localhost:3000' # must exists globally because of MPIndexer
server_args = ['poetry', 'run', 'moto_server', 'sqs', '-p3000']
server = subprocess.Popen(server_args, stdout=server_output, stderr=server_output)
time.sleep(5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make a variable

APPROXIMATE_MOTO_STARTUP_SECONDS = 5

and use that.

@@ -162,6 +162,7 @@ def test_parent_type_object_schema():
}


@pytest.mark.skip # mocks are not setup to handle the new method
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a more explicit TODO describing the specifics of what is missing?

if 'delete_fields' in request.params:
deleted_fields = request.params['delete_fields'].split(',')
for deleted_field in deleted_fields:
body[deleted_field] = ''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the empty string a better value than None? Is it that None would not work or is the empty string treated the same as missing anyway? I feel like '' is sometimes semantically meaningful in a non-None kind of way and I'm wondering how heuristic vs. algorithmic this technique is. I guess the doc string says it's intended for diff, in which case maybe it doesn't matter. I think I'd like this to have a leading _ in its name so its clear this function is used only for a specific need and/or it wouldn't hurt, too, if its name also had _for_diff at the end of it.

Copy link
Collaborator

@netsettler netsettler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We noted some things when we went through this interactively that need tending to.
If you need me to look again after you do this, please let me know.
And the testing could be beefed up.
But basically I'm OK with this and giving it provisional review.
I had a few other non-blocking comments.

"""
skip, diffs, child_to_parent_type = build_diff_metadata(registry, diff)
skip, diffs, diff_type, child_to_parent_type = build_diff_metadata(registry, diff)
valid_diff_types = child_to_parent_type.get(diff_type, []) + [diff_type]
# go through all invalidated uuids, looking at the embedded list of the item type
item_type_is_invalidated = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd. I've had called this item_types_invalidated.

# go through all invalidated uuids, looking at the embedded list of the item type
item_type_is_invalidated = {}
for invalidated_uuid, invalidated_item_type in invalidated_with_type:
if skip is True: # if we detected a change to a default embed, invalidate everything

# if in debug mode, populate invalidation metadata at the expense of performance
if verbose:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a bug? It looks like it was doing real work only in verbose mode. That seems like a recipe for disaster. Was it work that was previously only used for debugging?

item_type_is_invalidated[invalidated_item_type] = True
continue
else: # in production, exit immediately if we see this, as this works by side-effect
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hmm. This is because you were previously looking for the first item, not the last item?

invalidated = [(obj['@id'], obj['@type'][0]) for obj in groups]
secondary = {obj['@id'] for obj in groups}
with type_embedded_list_mock(embedded_list=['sources.sample_objects.notes']):
self.runtest(testapp, diff, invalidated, secondary, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the long run it's better to do keyword calls to stuff like this in case the arguments change. Easy for random data like this to get accepted and just do the wrong thing if the arguments are not connected right.

# the last link target ends and the terminal field begins
for i in range(len(split_embed), 0, -1):
embed_path = '.'.join(split_embed[0:i])
if embed_path.endswith('*'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment saying whether we're ignoring this case or handling it elsewhere, or why it's OK to just skip out. Yes, one might be able to figure it out, but comments are useful to answer obvious questions without making one read code.

"""
skip, diffs, child_to_parent_type = build_diff_metadata(registry, diff)
skip, diffs, diff_type, child_to_parent_type = build_diff_metadata(registry, diff)
valid_diff_types = child_to_parent_type.get(diff_type, []) + [diff_type]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We noted in interactive review that this may need more scrutiny.

@netsettler netsettler changed the title Invalidation Scope Fix Invalidation Scope Fix (C4-854) Jul 13, 2022
@willronchetti willronchetti merged commit 5717fdb into master Jul 14, 2022
@willronchetti willronchetti deleted the scope_fix branch July 14, 2022 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants