-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move all fields test to new framework #239
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thoroughly reviewed the actual tests. But like the base case, We should evaluate what is already in the inherited case and try to only write code for the differences if possible. Once you get the base case comments cleaned up, lets try to do this together.
tests/base_hubspot.py
Outdated
def get_properties(self): | ||
start_date = dt.today() - timedelta(days=1) | ||
start_date_with_fmt = dt.strftime(start_date, self.START_DATE_FORMAT) | ||
|
||
return {'start_date' : start_date_with_fmt} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the new framework we make the start date a property so it can be changed in tests outside of this method.
def get_properties(self): | |
start_date = dt.today() - timedelta(days=1) | |
start_date_with_fmt = dt.strftime(start_date, self.START_DATE_FORMAT) | |
return {'start_date' : start_date_with_fmt} | |
# set the default start date which can be overridden in the tests. | |
start_date = BaseCase.timedelta_formatted(dt.utcnow(), delta=timedelta(days=-1)) | |
def get_properties(self): | |
return {'start_date': self.start_date, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - Changed this method as per the suggestion
tests/base_hubspot.py
Outdated
"campaigns": { | ||
BaseCase.PRIMARY_KEYS: {"id"}, | ||
BaseCase.REPLICATION_METHOD: BaseCase.FULL_TABLE, | ||
HubspotBaseCase.OBEYS_START_DATE: False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this property OBEYS_START_DATE
although not currently a framework concept would apply to more taps and should be moved into the tap tester framework
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this to the tap-tester base case
tests/base_hubspot.py
Outdated
BaseCase.PRIMARY_KEYS: {"companyId"}, | ||
BaseCase.REPLICATION_METHOD: BaseCase.INCREMENTAL, | ||
BaseCase.REPLICATION_KEYS: {"property_hs_lastmodifieddate"}, | ||
HubspotBaseCase.EXPECTED_PAGE_SIZE: 250, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expected page size is now a concept in the framework as API_LIMIT. We should use this instead.
https://github.com/stitchdata/tap-tester/blob/ed0886e9a9bd3f1340ad4929a03f2d67ab4ebf2a/tap_tester/base_suite_tests/base_case.py#L88
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed EXPECTED_PAGE_SIZE to API_LIMIT to use the variable already defined in tap-tester base case
tests/base_hubspot.py
Outdated
BaseCase.REPLICATION_METHOD: BaseCase.INCREMENTAL, | ||
HubspotBaseCase.EXPECTED_PAGE_SIZE: 100, | ||
HubspotBaseCase.OBEYS_START_DATE: True, | ||
HubspotBaseCase.PARENT_STREAM: 'companies' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how common parent stream is, If this is unique to hubspot then doing it like this is appropriate. If we think there are more cases where this would exist we should also move this concept to the tap-tester framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see PARENT_STREAM defined. Can you show me where this is? Did it move to BaseCase? If so, we should specify that instead of HubspotBaseCase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed it.. It is moved to tap-tester base case. I changed to use BaseCase.PARENT_STREAM
tests/base_hubspot.py
Outdated
def expected_primary_keys(self): | ||
""" | ||
return a dictionary with key of table name | ||
and value as a set of primary key fields | ||
""" | ||
return {table: properties.get(self.PRIMARY_KEYS, set()) | ||
for table, properties | ||
in self.expected_metadata().items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be overriding base case methods unless there is a good reason to make them different. In this case there is no difference except expanded functionality in the base case. We should remove this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah.. that's right- a lot of methose methods are already there in tap-tester base case. Didn't realise it. Removed all these methods from here.
tests/base_hubspot.py
Outdated
def expected_primary_keys(self): | ||
|
||
""" | ||
return a dictionary with key of table name | ||
and value as a set of primary key fields | ||
""" | ||
return {table: properties.get(self.PRIMARY_KEYS, set()) | ||
for table, properties | ||
in self.expected_metadata().items()} | ||
|
||
def expected_automatic_fields(self): | ||
auto_fields = {} | ||
for k, v in self.expected_metadata().items(): | ||
auto_fields[k] = v.get(self.PRIMARY_KEYS, set()) | v.get(self.REPLICATION_KEYS, set()) | ||
return auto_fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also true for these methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/base_hubspot.py
Outdated
########################## | ||
# Common Test Actions # | ||
########################## |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for all of these as well. These are all in tap tester base case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/base_hubspot.py
Outdated
def datetime_from_timestamp(self, value, str_format="%Y-%m-%dT00:00:00Z"): | ||
""" | ||
Takes in a unix timestamp in milliseconds. | ||
Returns a string formatted python datetime | ||
""" | ||
try: | ||
datetime_value = dt.fromtimestamp(value) | ||
datetime_str = dt.strftime(datetime_value, str_format) | ||
except ValueError as err: | ||
raise NotImplementedError( | ||
f"Invalid argument 'value': {value} " | ||
"This method was designed to accept unix timestamps in milliseconds." | ||
) | ||
return datetime_str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the only tap that uses unix timestamps, but this function looks like it could be helpful to other taps. I would consider if it should be moved to base case in the helper function? https://github.com/stitchdata/tap-tester/blob/ed0886e9a9bd3f1340ad4929a03f2d67ab4ebf2a/tap_tester/base_suite_tests/base_case.py#L474-L522
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this to tap-tester base case
tests/base_hubspot.py
Outdated
def is_child(self, stream): | ||
"""return true if this stream is a child stream""" | ||
return self.expected_metadata()[stream].get(self.PARENT_STREAM) is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should follow PARENT_STREAM. If that stays here this should, if that goes to base case this should follow it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this to tap-tester base case
can_save = True | ||
return ret_records | ||
|
||
FIELDS_ADDED_BY_TAP = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should probably be in the class and not at the module level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved all these to the new hubspot base case - base_hubspot.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments that are not explained in detail and would be challenging to implement. Let's get together to discuss them so it is clear what solutions should be implemented and how to go about it.
tests/base_hubspot.py
Outdated
def expected_check_streams(self): | ||
return set(self.expected_metadata().keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as this method in tap-tester with a different name. It would be best to remove this and rename the calls to it to the method in tap-tester. At the minimum if we want to keep this name for some reason, just can make it a wrapper and call the underlying tap-tester method in it as follows:
def expected_check_streams(self): | |
return set(self.expected_metadata().keys()) | |
@classmethod | |
def expected_check_streams(cls): | |
return cls.expected_stream_names() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is used only start_date test. So, I removed this. I think we can change start date test when we move that to the new framework.
tests/base_hubspot.py
Outdated
BaseCase.REPLICATION_METHOD: BaseCase.INCREMENTAL, | ||
HubspotBaseCase.EXPECTED_PAGE_SIZE: 100, | ||
HubspotBaseCase.OBEYS_START_DATE: True, | ||
HubspotBaseCase.PARENT_STREAM: 'companies' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see PARENT_STREAM defined. Can you show me where this is? Did it move to BaseCase? If so, we should specify that instead of HubspotBaseCase.
def convert_datatype(self, expected_records): | ||
for stream, records in expected_records.items(): | ||
for record in records: | ||
|
||
# convert timestamps to string formatted datetime | ||
timestamp_keys = {'timestamp'} | ||
for key in timestamp_keys: | ||
timestamp = record.get(key) | ||
if timestamp: | ||
unformatted = datetime.datetime.fromtimestamp(timestamp/1000) | ||
formatted = datetime.datetime.strftime(unformatted, self.BASIC_DATE_FORMAT) | ||
record[key] = formatted | ||
|
||
return expected_records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling unix timestamps and converting them to a formatted string looks like something that is useful in multiple taps. I would like to work to see if we can make the part of this that actually does the conversion be a utility we can use in tap-tester itself so we can use this logic once and if there are any improvements just update it in one place. Maybe we can pair on this.
LOGGER.info("The test client found %s %s records.", len(records), stream) | ||
|
||
self.convert_datatype(self.expected_records) | ||
super().setUp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the fact that you just have the differences in the setup that are unique to this tap and then call the existing setup so it is easy to see why we are overriding this method.
@unittest.skip("Random selection doesn't always sync records") | ||
def test_all_streams_sync_records(self): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we skip this test if we have a client to create data. Wouldn't we verify in the sync that at least one record is present and if it wasn't we can create it. For instance if there isn't a contact we can use
Line 781 in 85f3de9
def create_contacts(self): |
@unittest.skip("Skip till all cards of missing fields are fixed. TDL-16145 ") | ||
def test_all_fields_for_streams_are_replicated(self): | ||
for stream in self.test_streams: | ||
with self.subTest(stream=stream): | ||
|
||
# gather expectations | ||
expected_all_keys = self.selected_fields.get(stream, set()) - set(self.MISSING_FIELDS.get(stream, {})) | ||
|
||
# gather results | ||
fields_replicated = self.actual_fields.get(stream, set()) | ||
|
||
# verify that all fields are sent to the target | ||
# test the combination of all records | ||
self.assertSetEqual(fields_replicated, expected_all_keys, | ||
logging=f"verify all fields are replicated for stream {stream}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same test that is above. Why skip this if we are running it above. Python also uses the latest definition of a method, so it is possible you are overwriting your main test and not running it.
|
||
return expected_records | ||
|
||
def test_all_fields_for_streams_are_replicated(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test in tap-tester is written concisely. It already has a MISSING_FIELDS concept. We should update tap tester to have extra concepts for ADDED_FIELDS. I'm not sure if there can be anything but missing and added fields but if there are categories of things that add up to a total of MISSING_FIELDS we can track them separately and then build missing fields for the test.
…c of missing and extra fields in tap-tester framework and exclude fields with non data programmatically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if this is passing (with the tap-tester changes) for all the streams. It looks very streamlined. Love it.
tests/base_hubspot.py
Outdated
'deals': { | ||
# BUG_TDL-14993 | https://jira.talendforge.org/browse/TDL-14993 | ||
# Has an value of object with key 'value' and value 'Null' | ||
'property_hs_date_entered_1258834', | ||
'property_hs_time_in_example_stage1660743867503491_315775040' | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is only used for the all_fields test. This is duplicative of the bad keys below and isn't necessary as there are just specific examples. We should move the bug info down to the method for removing the bad keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah.. This just got carried over from the old test.
So, we added these extra fields and removed the bad prefixes later.
Not needed here, will remove it.
for key in self.expected_all_keys: | ||
for bad_prefix in bad_key_prefixes: | ||
if key.startswith(bad_prefix): | ||
bad_keys.add(key) | ||
for key in self.fields_replicated: | ||
for bad_prefix in bad_key_prefixes: | ||
if key.startswith(bad_prefix): | ||
bad_keys.add(key) | ||
|
||
for key in bad_keys: | ||
if key in self.expected_all_keys: | ||
self.expected_all_keys.remove(key) | ||
if key in self.fields_replicated: | ||
self.fields_replicated.remove(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments:
- I'm not sure why we need to build bad keys and then go through them. It appears we could just remove them at the same time.
- If the key is in both the expected_all_keys and the fields_replicated this would pass. So I am uncertain as to why we would need to remove this from both our expectations and our actual results. Can we log if a key is removed and where it is removed from when we do this. I would expect it to only be in the actual results (fields_replicated) and not necessary in the expectations. If this isn't a good assumption I would like to figure out what I'm not understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first comment, I felt the same and removed it directly instead of adding to bad_keys, but it complained that the set has changed while iterating.
For the second comment - let me print and see what is being removed. It could be the result of adding that extra fields in the above comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the method to include only the bad keys that are not in both the lists.
# Tests To Skip | ||
########################################################################## | ||
|
||
@unittest.skip("Skip till all cards of missing fields are fixed. TDL-16145 ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be nice place to use the tap-tester @skipUntilDone to ensure this gets picked up as soon as TDL-16145 is completed.
Description of change
https://jira.talendforge.org/browse/TDL-24438
Made a new base class to use the new framework.
Made a new all_fields_test class to use the new framework and separate the tests for record count validation and record values validation.
Manual QA steps
Risks
Rollback steps