Move all fields test to new framework #239

bhuvana-talend · 2023-11-01T17:25:36Z

Description of change

https://jira.talendforge.org/browse/TDL-24438
Made a new base class to use the new framework.
Made a new all_fields_test class to use the new framework and separate the tests for record count validation and record values validation.

Manual QA steps

Risks

Rollback steps

revert this branch

HarrisonMarcRose

I haven't thoroughly reviewed the actual tests. But like the base case, We should evaluate what is already in the inherited case and try to only write code for the differences if possible. Once you get the base case comments cleaned up, lets try to do this together.

HarrisonMarcRose · 2023-11-16T14:32:39Z

tests/base_hubspot.py

+    def get_properties(self):
+        start_date = dt.today() - timedelta(days=1)
+        start_date_with_fmt = dt.strftime(start_date, self.START_DATE_FORMAT)
+
+        return {'start_date' : start_date_with_fmt}


In the new framework we make the start date a property so it can be changed in tests outside of this method.

Suggested change

def get_properties(self):

start_date = dt.today() - timedelta(days=1)

start_date_with_fmt = dt.strftime(start_date, self.START_DATE_FORMAT)

return {'start_date' : start_date_with_fmt}

# set the default start date which can be overridden in the tests.

start_date = BaseCase.timedelta_formatted(dt.utcnow(), delta=timedelta(days=-1))

def get_properties(self):

return {'start_date': self.start_date,

Agreed - Changed this method as per the suggestion

HarrisonMarcRose · 2023-11-16T14:34:57Z

tests/base_hubspot.py

+            "campaigns": {
+                BaseCase.PRIMARY_KEYS: {"id"},
+                BaseCase.REPLICATION_METHOD: BaseCase.FULL_TABLE,
+                HubspotBaseCase.OBEYS_START_DATE: False


I think this property OBEYS_START_DATE although not currently a framework concept would apply to more taps and should be moved into the tap tester framework

Moved this to the tap-tester base case

HarrisonMarcRose · 2023-11-16T14:36:39Z

tests/base_hubspot.py

+                BaseCase.PRIMARY_KEYS: {"companyId"},
+                BaseCase.REPLICATION_METHOD: BaseCase.INCREMENTAL,
+                BaseCase.REPLICATION_KEYS: {"property_hs_lastmodifieddate"},
+                HubspotBaseCase.EXPECTED_PAGE_SIZE: 250,


Expected page size is now a concept in the framework as API_LIMIT. We should use this instead.
https://github.com/stitchdata/tap-tester/blob/ed0886e9a9bd3f1340ad4929a03f2d67ab4ebf2a/tap_tester/base_suite_tests/base_case.py#L88

Changed EXPECTED_PAGE_SIZE to API_LIMIT to use the variable already defined in tap-tester base case

HarrisonMarcRose · 2023-11-16T14:38:37Z

tests/base_hubspot.py

+                BaseCase.REPLICATION_METHOD: BaseCase.INCREMENTAL,
+                HubspotBaseCase.EXPECTED_PAGE_SIZE: 100,
+                HubspotBaseCase.OBEYS_START_DATE: True,
+                HubspotBaseCase.PARENT_STREAM: 'companies'


I'm not sure how common parent stream is, If this is unique to hubspot then doing it like this is appropriate. If we think there are more cases where this would exist we should also move this concept to the tap-tester framework.

I don't see PARENT_STREAM defined. Can you show me where this is? Did it move to BaseCase? If so, we should specify that instead of HubspotBaseCase.

missed it.. It is moved to tap-tester base case. I changed to use BaseCase.PARENT_STREAM

HarrisonMarcRose · 2023-11-16T14:41:03Z

tests/base_hubspot.py

+    def expected_primary_keys(self):
+        """
+        return a dictionary with key of table name
+        and value as a set of primary key fields
+        """
+        return {table: properties.get(self.PRIMARY_KEYS, set())
+                for table, properties
+                in self.expected_metadata().items()}


We should not be overriding base case methods unless there is a good reason to make them different. In this case there is no difference except expanded functionality in the base case. We should remove this method.

This exists in https://github.com/stitchdata/tap-tester/blob/ed0886e9a9bd3f1340ad4929a03f2d67ab4ebf2a/tap_tester/base_suite_tests/base_case.py#L162-L172

Ah.. that's right- a lot of methose methods are already there in tap-tester base case. Didn't realise it. Removed all these methods from here.

HarrisonMarcRose · 2023-11-16T14:42:37Z

tests/base_hubspot.py

+    def expected_primary_keys(self):
+
+        """
+        return a dictionary with key of table name
+        and value as a set of primary key fields
+        """
+        return {table: properties.get(self.PRIMARY_KEYS, set())
+                for table, properties
+                in self.expected_metadata().items()}
+
+    def expected_automatic_fields(self):
+        auto_fields = {}
+        for k, v in self.expected_metadata().items():
+            auto_fields[k] = v.get(self.PRIMARY_KEYS, set()) | v.get(self.REPLICATION_KEYS, set())
+        return auto_fields


Also true for these methods

HarrisonMarcRose · 2023-11-16T14:43:16Z

tests/base_hubspot.py

+    ##########################
+    #  Common Test Actions   #
+    ##########################


Same for all of these as well. These are all in tap tester base case.

HarrisonMarcRose · 2023-11-16T14:47:04Z

tests/base_hubspot.py

+    def datetime_from_timestamp(self, value, str_format="%Y-%m-%dT00:00:00Z"):
+        """
+        Takes in a unix timestamp in milliseconds.
+        Returns a string formatted python datetime
+        """
+        try:
+            datetime_value = dt.fromtimestamp(value)
+            datetime_str = dt.strftime(datetime_value, str_format)
+        except ValueError as err:
+            raise NotImplementedError(
+                f"Invalid argument 'value':  {value}  "
+                "This method was designed to accept unix timestamps in milliseconds."
+            )
+        return datetime_str


Not sure if this is the only tap that uses unix timestamps, but this function looks like it could be helpful to other taps. I would consider if it should be moved to base case in the helper function? https://github.com/stitchdata/tap-tester/blob/ed0886e9a9bd3f1340ad4929a03f2d67ab4ebf2a/tap_tester/base_suite_tests/base_case.py#L474-L522

Moved this to tap-tester base case

HarrisonMarcRose · 2023-11-16T14:47:39Z

tests/base_hubspot.py

+    def is_child(self, stream):
+        """return true if this stream is a child stream"""
+        return self.expected_metadata()[stream].get(self.PARENT_STREAM) is not None


This should follow PARENT_STREAM. If that stays here this should, if that goes to base case this should follow it.

Moved this to tap-tester base case

HarrisonMarcRose · 2023-11-16T14:50:23Z

tests/test_hubspot_newfw_all_fields.py

+        can_save = True
+    return ret_records
+
+FIELDS_ADDED_BY_TAP = {


These should probably be in the class and not at the module level.

Moved all these to the new hubspot base case - base_hubspot.py

…on items to the base case

HarrisonMarcRose

I left some comments that are not explained in detail and would be challenging to implement. Let's get together to discuss them so it is clear what solutions should be implemented and how to go about it.

HarrisonMarcRose · 2023-11-27T19:35:43Z

tests/base_hubspot.py

+    def expected_check_streams(self):
+        return set(self.expected_metadata().keys())


This is the same as this method in tap-tester with a different name. It would be best to remove this and rename the calls to it to the method in tap-tester. At the minimum if we want to keep this name for some reason, just can make it a wrapper and call the underlying tap-tester method in it as follows:

Suggested change

def expected_check_streams(self):

return set(self.expected_metadata().keys())

@classmethod

def expected_check_streams(cls):

return cls.expected_stream_names()

This method is used only start_date test. So, I removed this. I think we can change start date test when we move that to the new framework.

HarrisonMarcRose · 2023-11-27T19:39:58Z

tests/base_hubspot.py

+                BaseCase.REPLICATION_METHOD: BaseCase.INCREMENTAL,
+                HubspotBaseCase.EXPECTED_PAGE_SIZE: 100,
+                HubspotBaseCase.OBEYS_START_DATE: True,
+                HubspotBaseCase.PARENT_STREAM: 'companies'


I don't see PARENT_STREAM defined. Can you show me where this is? Did it move to BaseCase? If so, we should specify that instead of HubspotBaseCase.

HarrisonMarcRose · 2023-11-27T19:46:21Z

tests/test_hubspot_newfw_all_fields.py

+    def convert_datatype(self, expected_records):
+        for stream, records in expected_records.items():
+            for record in records:
+
+                # convert timestamps to string formatted datetime
+                timestamp_keys = {'timestamp'}
+                for key in timestamp_keys:
+                    timestamp = record.get(key)
+                    if timestamp:
+                        unformatted = datetime.datetime.fromtimestamp(timestamp/1000)
+                        formatted = datetime.datetime.strftime(unformatted, self.BASIC_DATE_FORMAT)
+                        record[key] = formatted
+
+        return expected_records


Handling unix timestamps and converting them to a formatted string looks like something that is useful in multiple taps. I would like to work to see if we can make the part of this that actually does the conversion be a utility we can use in tap-tester itself so we can use this logic once and if there are any improvements just update it in one place. Maybe we can pair on this.

HarrisonMarcRose · 2023-11-27T19:47:22Z

tests/test_hubspot_newfw_all_fields.py

+            LOGGER.info("The test client found %s %s records.", len(records), stream)
+
+        self.convert_datatype(self.expected_records)
+        super().setUp()


Love the fact that you just have the differences in the setup that are unique to this tap and then call the existing setup so it is easy to see why we are overriding this method.

HarrisonMarcRose · 2023-11-27T19:51:22Z

tests/test_hubspot_newfw_all_fields.py

+    @unittest.skip("Random selection doesn't always sync records")
+    def test_all_streams_sync_records(self):
+        pass


Why would we skip this test if we have a client to create data. Wouldn't we verify in the sync that at least one record is present and if it wasn't we can create it. For instance if there isn't a contact we can use

tap-hubspot/tests/client.py

Line 781 in 85f3de9

def create_contacts(self):

HarrisonMarcRose · 2023-11-27T19:52:33Z

tests/test_hubspot_newfw_all_fields.py

+    @unittest.skip("Skip till all cards of missing fields are fixed. TDL-16145 ")
+    def test_all_fields_for_streams_are_replicated(self):
+        for stream in self.test_streams:
+            with self.subTest(stream=stream):
+
+                # gather expectations
+                expected_all_keys = self.selected_fields.get(stream, set()) - set(self.MISSING_FIELDS.get(stream, {}))
+
+                # gather results
+                fields_replicated = self.actual_fields.get(stream, set())
+
+                # verify that all fields are sent to the target
+                # test the combination of all records
+                self.assertSetEqual(fields_replicated, expected_all_keys,
+                                    logging=f"verify all fields are replicated for stream {stream}")


This is the same test that is above. Why skip this if we are running it above. Python also uses the latest definition of a method, so it is possible you are overwriting your main test and not running it.

HarrisonMarcRose · 2023-11-27T20:02:44Z

tests/test_hubspot_newfw_all_fields.py

+
+        return expected_records
+
+    def test_all_fields_for_streams_are_replicated(self):


I think this test in tap-tester is written concisely. It already has a MISSING_FIELDS concept. We should update tap tester to have extra concepts for ADDED_FIELDS. I'm not sure if there can be anything but missing and added fields but if there are categories of things that add up to a total of MISSING_FIELDS we can track them separately and then build missing fields for the test.

…c of missing and extra fields in tap-tester framework and exclude fields with non data programmatically

HarrisonMarcRose

Curious if this is passing (with the tap-tester changes) for all the streams. It looks very streamlined. Love it.

HarrisonMarcRose · 2023-12-04T21:30:37Z

tests/base_hubspot.py

+        'deals': {
+            # BUG_TDL-14993 | https://jira.talendforge.org/browse/TDL-14993
+            #                 Has an value of object with key 'value' and value 'Null'
+            'property_hs_date_entered_1258834',
+            'property_hs_time_in_example_stage1660743867503491_315775040'
+        },


I think this is only used for the all_fields test. This is duplicative of the bad keys below and isn't necessary as there are just specific examples. We should move the bug info down to the method for removing the bad keys.

Ah.. This just got carried over from the old test.
So, we added these extra fields and removed the bad prefixes later.
Not needed here, will remove it.

HarrisonMarcRose · 2023-12-04T21:35:10Z

tests/test_hubspot_newfw_all_fields.py

+            for key in self.expected_all_keys:
+                for bad_prefix in bad_key_prefixes:
+                   if key.startswith(bad_prefix):
+                        bad_keys.add(key)
+            for key in self.fields_replicated:
+                for bad_prefix in bad_key_prefixes:
+                   if key.startswith(bad_prefix):
+                        bad_keys.add(key)
+
+            for key in bad_keys:
+                if key in self.expected_all_keys:
+                    self.expected_all_keys.remove(key)
+                if key in self.fields_replicated:
+                    self.fields_replicated.remove(key)


Two comments:

I'm not sure why we need to build bad keys and then go through them. It appears we could just remove them at the same time.

If the key is in both the expected_all_keys and the fields_replicated this would pass. So I am uncertain as to why we would need to remove this from both our expectations and our actual results. Can we log if a key is removed and where it is removed from when we do this. I would expect it to only be in the actual results (fields_replicated) and not necessary in the expectations. If this isn't a good assumption I would like to figure out what I'm not understanding.

For the first comment, I felt the same and removed it directly instead of adding to bad_keys, but it complained that the set has changed while iterating.
For the second comment - let me print and see what is being removed. It could be the result of adding that extra fields in the above comment.

Fixed the method to include only the bad keys that are not in both the lists.

…her list

bhtowles · 2023-12-18T17:41:46Z

tests/test_hubspot_newfw_all_fields.py

+    # Tests To Skip
+    ##########################################################################
+
+    @unittest.skip("Skip till all cards of missing fields are fixed. TDL-16145 ")


This would be nice place to use the tap-tester @skipUntilDone to ensure this gets picked up as soon as TDL-16145 is completed.

Move all fields test to new framework

60aa211

HarrisonMarcRose reviewed Nov 16, 2023

View reviewed changes

bhuvana-talend added 2 commits November 20, 2023 22:09

As per PR review, removed the overrridden methods and moved some comm…

bd451ee

…on items to the base case

Moved some fields to tap-tester base case

16279de

HarrisonMarcRose requested changes Nov 27, 2023

View reviewed changes

bhuvana-talend added 3 commits November 28, 2023 18:03

Implemented PR Review comments

8e118c9

Improvements in new framework all fields test to incorporate the logi…

b279239

…c of missing and extra fields in tap-tester framework and exclude fields with non data programmatically

Removed not-used method and constants

cb62f07

HarrisonMarcRose reviewed Dec 4, 2023

View reviewed changes

bhuvana-talend and others added 4 commits December 5, 2023 00:06

Removed the extra fields for deals as it is covered in bad prefix

1026c65

Fixed removed_bad_keys to add to bad keys only if it is not in the ot…

b9ef05e

…her list

Merge branch 'master' into tdl-24438-2

fcd97dc

Merge branch 'master' into tdl-24438-2

de4bdb1

HarrisonMarcRose approved these changes Dec 11, 2023

View reviewed changes

bhuvana-talend and others added 2 commits December 12, 2023 16:03

Merge branch 'master' into tdl-24438-2

25ba332

Removed unneeded line

e57af17

bhtowles reviewed Dec 18, 2023

View reviewed changes

bhuvana-talend merged commit e57af17 into master Dec 18, 2023
15 checks passed

		def expected_check_streams(self):
		return set(self.expected_metadata().keys())

-    def expected_check_streams(self):
-        return set(self.expected_metadata().keys())
+    @classmethod
+    def expected_check_streams(cls):
+        return cls.expected_stream_names()


		return expected_records

		def test_all_fields_for_streams_are_replicated(self):

Move all fields test to new framework #239

Move all fields test to new framework #239

Conversation

bhuvana-talend commented Nov 1, 2023

Description of change

Manual QA steps

Risks

Rollback steps

HarrisonMarcRose left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HarrisonMarcRose left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HarrisonMarcRose left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment