Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ETL-329] delete check #12

Merged
merged 5 commits into from
Feb 2, 2023
Merged

[ETL-329] delete check #12

merged 5 commits into from
Feb 2, 2023

Conversation

rxu17
Copy link
Contributor

@rxu17 rxu17 commented Feb 1, 2023

Purpose: Expands check for deleted records in all other Healthkit data types in get_metadata:

  • HealthKitV2Heartbeat
  • HealthKitV2Electrocardiogram
  • HealthKitV2Workouts

Also adds tests, and Dockerfile for testing the changes

@rxu17 rxu17 requested a review from a team as a code owner February 1, 2023 03:27
Copy link
Contributor

@philerooski philerooski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just needs some minor changes in the tests. Also, what are the empty __init__.py files for? Can we get rid of them to reduce clutter?

]
and basename_components[-2] == "Deleted"
):
metadata["type"] = "{}_Deleted".format(metadata["type"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I definitely prefer to use f-string when possible, but personally, the double/single quote is annoying

f"{metadata['type']}_Deleted"

we can...

meta_type = metadata['type']
f"{meta_type}_Deleted"

I am also ok with .format in this specific scenario.

Copy link
Contributor Author

@rxu17 rxu17 Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I have seen it, but thought that in this scenario, it would look a bit cleaner to have .format.


assert (
s3_to_json.get_metadata(
"HealthKitV2Heartbeat_Deleted_20201022-20211022.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HealthKitV2Heartbeat_Samples_Deleted_20201022-20211022.json

class TestS3ToJsonS3:
def test_get_metadata_type(self):
assert (
s3_to_json.get_metadata("HealthKitV2Samples_20201022-20211022.json")["type"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the actual format like HealthKitV2Samples_AppleExerciseTime_20230112-20230114.json


assert (
s3_to_json.get_metadata(
"HealthKitV2Electrocardiogram_Deleted_20201022-20211022.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HealthKitV2Electrocardiogram_Samples_Deleted_20201022-20211022.json

@rxu17
Copy link
Contributor Author

rxu17 commented Feb 1, 2023

Just needs some minor changes in the tests. Also, what are the empty __init__.py files for? Can we get rid of them to reduce clutter?

I had to add them because the test script is nested inside the subdirectories under tests/glue/jobs and adding init.py allows me to import them like src.glue.jobs import s3_to_json for example. Same goes for the lambda app script test as well. I can just remove the glue/jobs folders under tests and have these tests sit right under tests and we won't have have the blank __init__.py files.

@philerooski
Copy link
Contributor

I want to try the tests for myself -- in the meanwhile, can you update line 50 of the tests README so that all tests are run and not just the lambda tests?

@rxu17
Copy link
Contributor Author

rxu17 commented Feb 1, 2023

I want to try the tests for myself -- in the meanwhile, can you update line 50 of the tests README so that all tests are run and not just the lambda tests?

Hmm I think currently the lambda tests are the only ones that can be run locally via a pipenv which is why I left that to be lambda specific otherwise you'll run into an error with pytest since test_s3_to_json.py has to be run in a Dockerfile.

@philerooski
Copy link
Contributor

Makes sense. Can you include a sentence that says as much in the README?

@philerooski
Copy link
Contributor

Do you also get this error when running tests?

ImportError while importing test module '/home/glue_user/workspace/recover/tests/test_s3_to_json_s3.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib64/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_s3_to_json_s3.py:7: in <module>
    from src.glue.jobs import s3_to_json_s3
E   ImportError: cannot import name 's3_to_json_s3' from 'src.glue.jobs' (unknown location)

@rxu17
Copy link
Contributor Author

rxu17 commented Feb 1, 2023

Nope, you're getting this error inside the Docker image?

@philerooski
Copy link
Contributor

Yeah. I figured it out. In the tests it should read:

from src.glue.jobs import s3_to_json

The file name is different than it is in BD.

@rxu17
Copy link
Contributor Author

rxu17 commented Feb 2, 2023

Yup it is different. Are you saying the tests don't have from src.glue.jobs import s3_to_json?

@philerooski
Copy link
Contributor

I confused myself. I had an untracked file in tests that was using the incorrect import.

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Nice work! Thanks for all the reviews

@rxu17 rxu17 merged commit 71d76c8 into main Feb 2, 2023
@rxu17 rxu17 deleted the etl-329-delete-check branch February 2, 2023 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants