Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest/vertica): performance improvement and bug fixes #8328

Merged
merged 89 commits into from
Aug 1, 2023

Conversation

vishalkSimplify
Copy link
Contributor

@vishalkSimplify vishalkSimplify commented Jun 28, 2023

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub
  1. Rearchitected the way queries were being executed to minimize the total number of queries during ingestion. Customers with large catalogs were seeing unacceptable performance. Changed to schema-level capture of information into memory and then iterating through memory to get individual object details. We did testing on our side and with new changes performance improvement was above 90% as compared to the already existing Vertica plugin.

  2. We have removed Oauth metadata features from Vertica as it was exposing security-related pieces of information.

  3. Added integration test for Vertica plugin which covers all our features per datahub request.

  4. we added form-based UI ingestion for the Vertica plugin.

  5. Upgraded vertica dialect from 0.0.1 to 0.0.8 which now supports the latest sqlalchemy features.

  6. Bug fixes and other small improvements.

@hsheth2 hsheth2 changed the title Vertica plugin performance improvement and bug fixes feat(ingest/vertica): performance improvement and bug fixes Jul 19, 2023
@anshbansal anshbansal merged commit ef3b948 into datahub-project:master Aug 1, 2023
45 checks passed
yoonhyejin pushed a commit that referenced this pull request Aug 24, 2023
spadhi7 added a commit to spadhi7/datahub that referenced this pull request Aug 29, 2023
* tag 'v0.10.5': (222 commits)
  fix(test): increase siblings.js test stability (datahub-project#8542)
  feat(search): Allow aggregating on facets that are not explicitly part of default filter set (datahub-project#8540)
  fix(ui) Make multiple small updates to new search and browse (datahub-project#8524)
  feat(presto-on-hive): allow v1 fieldpaths in the presto-on-hive source (datahub-project#8474)
  feat(cli): Adds ability to upload recipes to DataHub's UI (datahub-project#8317)
  feat(browseV2): add browseV2 logic to system update (datahub-project#8506)
  fix(ingest/json-schema): convert non-string enums to strings (datahub-project#8479)
  feat(ingestion/tableau): support column level lineage for custom sql (datahub-project#8466)
  test(ingest): test case statements with sql parser (datahub-project#8437)
  feat(ingest/vertica): performance improvement and bug fixes (datahub-project#8328)
  ci: reduce git fetch depth (datahub-project#8473)
  fix(ingest): remove duplication of tags (datahub-project#8532)
  docs: small update to homepage (datahub-project#8483)
  fix(ingest): pin boto3-stubs in CI (datahub-project#8527)
  feat(siblings): hiding non-existant siblings in FE (datahub-project#8528)
  fix(ingest/build): Fix sagemaker mypy and flake8 issues (datahub-project#8530)
  feat(metrics): add metrics for aspect write and bytes (datahub-project#8526)
  feat(elasticsearch): allow bulk delete (datahub-project#8424)
  fix(ui): use locale lowercase when filtering columns of an entity in the lineage (datahub-project#8213)
  fix(auth): ignore case when comparing http headers (datahub-project#8356)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green. product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants