Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingestion): fix AssertionError in base_transformer #7702

Conversation

sgomezvillamor
Copy link
Contributor

@sgomezvillamor sgomezvillamor commented Mar 28, 2023

This is fixing an unnecessary validation that is raising an AssertionError.

Context:

We have a transform with the following entity_types:

    def entity_types(self) -> List[str]:
        entities_with_ownership_aspect = ["dataset", "chart", "dashboard", "container"]
        return entities_with_ownership_aspect

In this scenario, when the ingestion pipeline is processing MCE events, _should_process raises the exception because of the assert here: container is not in the self.entity_type_mappings

        if isinstance(record, MetadataChangeEventClass):
            for e in entity_types:
                assert (
                    e in self.entity_type_mappings
                ), f"Do not have a class mapping for {e}. Subscription to this entity will not work for transforming MCE-s"
                if isinstance(record.proposedSnapshot, self.entity_type_mappings[e]):
                    return True
            # fall through, no entity type matched
            return False

Instead, in the case of MCE events, not all entity_types should be considered for the lookup but the ones included in the mapping.

I also take this as an opportunity to extend the entities in the mapping.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Mar 28, 2023
@sgomezvillamor sgomezvillamor marked this pull request as ready for review March 28, 2023 16:15
@hsheth2 hsheth2 changed the title fix(ingestion): fixes AssertionError in base_transformer fix(ingestion): fix AssertionError in base_transformer Mar 29, 2023
@hsheth2
Copy link
Collaborator

hsheth2 commented Mar 29, 2023

I was thinking about this PR a bit more. We should really just extract the entity type from the MCE's urn (using guess_entity_type) and return true/false based on that, rather than this weird iteration + type mapping thing

Not sure if you want to take that @sgomezvillamor or if we should leave that for a follow-up

@codecov-commenter
Copy link

codecov-commenter commented Mar 29, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -7.92 ⚠️

Comparison is base (c0f7ba2) 74.98% compared to head (3c047d7) 67.06%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7702      +/-   ##
==========================================
- Coverage   74.98%   67.06%   -7.92%     
==========================================
  Files         353      353              
  Lines       35422    35418       -4     
==========================================
- Hits        26560    23753    -2807     
- Misses       8862    11665    +2803     
Flag Coverage Δ
pytest-testIntegration ?
pytest-testIntegrationBatch1 36.51% <50.00%> (+<0.01%) ⬆️
pytest-testQuick 63.58% <100.00%> (-0.01%) ⬇️
pytest-testSlowIntegration 32.98% <50.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../datahub/ingestion/transformer/base_transformer.py 93.26% <100.00%> (-0.25%) ⬇️

... and 77 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@sgomezvillamor
Copy link
Contributor Author

hi @hsheth2 ! that's nice suggestion because then we can get rid of the entity_types_mapping! done in 92fed22

@hsheth2
Copy link
Collaborator

hsheth2 commented Mar 29, 2023

@sgomezvillamor thanks!

@hsheth2 hsheth2 merged commit 2580847 into datahub-project:master Mar 29, 2023
@sgomezvillamor sgomezvillamor deleted the fix-base_transformer-safe-should-process branch March 30, 2023 07:02
yoonhyejin pushed a commit that referenced this pull request Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants