Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug linking properties across cycles when UBID is matching criteria #4774

Open
perryr16 opened this issue Sep 5, 2024 · 0 comments
Open
Assignees

Comments

@perryr16
Copy link
Contributor

perryr16 commented Sep 5, 2024

Issue

Step 6 linking function does not account for the UBID Jaccard Index (as it does in the step 4 matching function).

This can lead to two error scenarios if properties have matching pm_property_ids (or other matching criteria), but different UBIDs. Consider the following setup.

P1: { cycle: 1, pm_property_id: 1, ubid: 1 }
P2: { cycle: 2, pm_property_id: 1, ubid: 2 }
  1. P1 is existing in Cycle 1 and P2 is uploaded to Cycle 2. These properties will be linked across cycles even though the UBIDs do not match.
P1: { cycle: 1, pm_property_id: 1, ubid: 1 }
P2: { cycle: 2, pm_property_id: 1, ubid: 2 }
P3: { cycle: 1, pm_property_id: 1, ubid: 2 }
  1. If P1 and P3 are uploaded to Cycle 1, they will appear as distinct properties as step 4 accounts for UBID. When P2 is uploaded to Cycle 2 we end up with an integrity error and the app hangs
seed_celery    | django.db.utils.IntegrityError: duplicate key value violates unique constraint "seed_propertyview_property_id_cycle_id_f8bdf6c2_uniq"
seed_celery    | DETAIL:  Key (property_id, cycle_id)=(31, 4) already exists.

Proposed solution:

Add the jaccard checking logic from step 4 to step 6.

  • Add a check_jaccard boolean argument to link_states
  • Insert jaccard checking logic into the for loop within function link_states
for idx, state in enumerate(states):
   existing_state_matches = match_lookup.get(tuple(getattr(state, c) for c in tuple_values), [])
   
   if check_jaccard:
         existing_state_matches = [
             existing_state
             for existing_state in existing_state_matches
             if existing_state.get("ubid")
             and check_jaccard_match(matching_criteria.get("ubid"), existing_state.get("ubid"), org.ubid_threshold)
         ]

   existing_views_matches = [view_lookup[x["id"]] for x in existing_state_matches]
   ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Testing on Dev/Stage
Development

No branches or pull requests

1 participant