feat: allow hive sql to be provided as config #312

feng-tao · 2020-08-08T04:22:31Z

Summary of Changes

This pr is to fix amundsen-io/amundsen#552 which allows user to provide hive metastore sql.

Tests

yes. add a unit test to test the new config.

Documentation

What documentation did you add or modify and why? Add any relevant links then remove this line

CheckList

Make sure you have checked all steps below to ensure a timely review.

PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
- In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.
PR includes a summary of changes.
PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain docstrings that explain what it does
PR passes make test

codecov-commenter · 2020-08-08T04:23:59Z

Codecov Report

Merging #312 into master will increase coverage by 0.79%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #312      +/-   ##
==========================================
+ Coverage   74.30%   75.10%   +0.79%     
==========================================
  Files         105      105              
  Lines        4492     4997     +505     
  Branches      419      518      +99     
==========================================
+ Hits         3338     3753     +415     
- Misses       1049     1127      +78     
- Partials      105      117      +12

Impacted Files	Coverage Δ
...builder/extractor/hive_table_metadata_extractor.py	`94.33% <100.00%> (+0.22%)`	⬆️
databuilder/rest_api/rest_api_failure_handlers.py	`90.00% <0.00%> (-3.34%)`	⬇️
databuilder/rest_api/base_rest_api_query.py	`92.59% <0.00%> (-1.53%)`	⬇️
...ilder/transformer/regex_str_replace_transformer.py	`95.34% <0.00%> (-1.08%)`	⬇️
...tabuilder/extractor/postgres_metadata_extractor.py	`94.68% <0.00%> (-0.49%)`	⬇️
databuilder/callback/call_back.py	`92.30% <0.00%> (-0.29%)`	⬇️
...abuilder/extractor/snowflake_metadata_extractor.py	`95.09% <0.00%> (-0.22%)`	⬇️
databuilder/loader/file_system_neo4j_csv_loader.py	`89.23% <0.00%> (-0.06%)`	⬇️
databuilder/extractor/db2_metadata_extractor.py	`0.00% <0.00%> (ø)`
databuilder/extractor/mysql_metadata_extractor.py	`0.00% <0.00%> (ø)`
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d24cba9...86f5668. Read the comment docs.

Golodhros

LGTM

jinhyukchang

Left one comment.

jinhyukchang · 2020-08-10T21:16:53Z

databuilder/extractor/hive_table_metadata_extractor.py

            where_clause_suffix=conf.get_string(HiveTableMetadataExtractor.WHERE_CLAUSE_SUFFIX_KEY))

+        self.sql_stmt = conf.get_string(HiveTableMetadataExtractor.EXTRACT_SQL.format(


I think it's missing closing bracket?

conf.get_string(HiveTableMetadataExtractor.EXTRACT_SQL.format --> conf.get_string(HiveTableMetadataExtractor.EXTRACT_SQL).format

By the way, we may not need to add where clause if they provide SQL statement. WDYT?

actually yeah, we should just let them provide the sql.

* commit 'e14b33e776929f8b020f1c6fec75d0fb83687693': (23 commits) Fix Athena sample DAG (amundsen-io#341) fix: Update postgres_sample_dag to set table extract job as upstream for elastic search publisher (amundsen-io#340) chore: mypy cleanup (convert last comment types, remove noqa imports) (amundsen-io#338) chore: Convert typings to mypy (amundsen-io#311) chore: replace all references of Lyft repo with Amundsen (amundsen-io#323) feat: add github actions for databuilder (amundsen-io#336) build: fix broken tests in Python 3.7, test in CI (amundsen-io#334) fix(deps): Unpin attrs (amundsen-io#332) ci: add dependabot config (amundsen-io#330) Change repo name in travis file (amundsen-io#324) tests: add mock for bigquery auth (amundsen-io#313) feat: allow hive sql to be provided as config (amundsen-io#312) chore: remove python2 (amundsen-io#310) chore: update deps for databuilder (amundsen-io#309) fix: cypher statement param issue in Neo4jStalenessRemovalTask (amundsen-io#307) fix: Added missing job tag key in hive_sample_dag.py (amundsen-io#308) feat: enhance glue extractor (amundsen-io#306) fix: Fix sql for missing columns and mysql based dialects (#550) (amundsen-io#305) docs: Fix broken doc link to dashboard_execution model (amundsen-io#296) chore: apply license headers to all the source files (amundsen-io#304) ... # Conflicts: # README.md # databuilder/extractor/kafka_source_extractor.py # databuilder/publisher/neo4j_csv_publisher.py # docs/models.md # example/scripts/sample_data_loader.py # setup.py

feng-tao added 2 commits August 7, 2020 21:18

feat: allow hive sql to be provided as config

2eab3f5

update test name

fff5ab4

feng-tao mentioned this pull request Aug 8, 2020

amundsendatabuilder -> HiveTableMetadataExtractor only works with mysql innodb amundsen-io/amundsen#552

Closed

feng-tao assigned dikshathakur3119 and jinhyukchang Aug 8, 2020

feng-tao added 2 commits August 7, 2020 21:24

update test name

4c35359

remove print

9c84665

jornh mentioned this pull request Aug 8, 2020

TTransport.TTransportException: TSocket read 0 byte for Thrift Hive metastore amundsen-io/amundsen#591

Closed

feng-tao assigned Golodhros Aug 10, 2020

Golodhros previously approved these changes Aug 10, 2020

View reviewed changes

jinhyukchang reviewed Aug 10, 2020

View reviewed changes

update

86f5668

feng-tao dismissed Golodhros’s stale review via 86f5668 August 10, 2020 21:37

jinhyukchang approved these changes Aug 10, 2020

View reviewed changes

feng-tao merged commit 8075a6c into master Aug 10, 2020

feng-tao deleted the tfeng_change_hive_sql branch August 10, 2020 23:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: allow hive sql to be provided as config #312

feat: allow hive sql to be provided as config #312

feng-tao commented Aug 8, 2020

codecov-commenter commented Aug 8, 2020 •

edited

Loading

Golodhros left a comment

jinhyukchang left a comment

jinhyukchang Aug 10, 2020

feng-tao Aug 10, 2020

		where_clause_suffix=conf.get_string(HiveTableMetadataExtractor.WHERE_CLAUSE_SUFFIX_KEY))

		self.sql_stmt = conf.get_string(HiveTableMetadataExtractor.EXTRACT_SQL.format(

feat: allow hive sql to be provided as config #312

feat: allow hive sql to be provided as config #312

Conversation

feng-tao commented Aug 8, 2020

Summary of Changes

Tests

Documentation

CheckList

codecov-commenter commented Aug 8, 2020 • edited Loading

Codecov Report

Golodhros left a comment

Choose a reason for hiding this comment

jinhyukchang left a comment

Choose a reason for hiding this comment

jinhyukchang Aug 10, 2020

Choose a reason for hiding this comment

feng-tao Aug 10, 2020

Choose a reason for hiding this comment

codecov-commenter commented Aug 8, 2020 •

edited

Loading