Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parsing of reports for custom dimensions #7602

Closed
nfmohit opened this issue Sep 20, 2023 · 12 comments
Closed

Implement parsing of reports for custom dimensions #7602

nfmohit opened this issue Sep 20, 2023 · 12 comments
Labels
P0 High priority PHP Type: Enhancement Improvement of an existing feature

Comments

@nfmohit
Copy link
Collaborator

nfmohit commented Sep 20, 2023

Feature Description

The response of report requests containing certain custom dimensions should be additionally parsed:

  1. The googlesitekit_post_author dimension will output the author’s user ID. Hence, its response should be parsed such that the author display name is reported.
  2. The googlesitekit_post_categories dimension will output the category IDs. Hence, its response should be parsed such that the category names are reported.

For example, if Site Kit makes a report request with the googlesitekit_post_author custom dimension, GA4 would return a user ID, e.g. 1. When sending this response back to the Site Kit dashboard, the user ID should be replaced with the display name of the user with that ID.

See relevant section in the design doc.


Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • If the newsKeyMetrics feature flag is enabled:
    • The GA report response for the googlesitekit_post_author custom dimension will output the author's user ID. This should be parsed in the Site Kit backend such that the display name is obtained from the user ID and that is sent to the client, i.e. the parsed value should be returned in the report replacing the original dimension value returned from GA4.
    • The GA report response for the googlesitekit_post_categories custom dimension will output a comma-separated list of category IDs. This should be parsed in the Site Kit backend such that the category names are obtained from the IDs and a comma-separated list of the category names are sent to the client, i.e. the parsed value should be returned in the report replacing the original dimension value returned from GA4.

Implementation Brief

  • In includes/Modules/Analytics_4/Report/Response:
    • Create methods to parse the individual custom dimensions:
      • get_author_display_name(): This method should take in a user_id and return the display name of that user. This can be obtained by using the get_userdata WordPress function that can be used to access the detailed WP_User object, and return the display_name property.
      • get_category_names(): This method should take in an array of category IDs and return an array of category names. This can be obtained by using the get_cat_name WordPress function.
    • Create a method called parse_custom_dimensions(). This method will take in the same parameter as Response::parse_response()'s $response (the Google_Service_AnalyticsData_RunReportResponse object).
      • In this method, loop through the dimensions requested within $response['dimensionHeaders'].
      • If any of the name's match googlesitekit_post_author or googlesitekit_post_categories, then call the appropriate method created above.
      • Pass the correct $response['dimensionValues'].
      • Overwrite the $response['dimensionValues'] converting their IDs to the display names that the above methods return.
      • NOTE: To accommodate for commas within category names itself, ensure each category name is a string, pushed into an array and then the entire array is JSON encoded. E.g. the return value of the above method would be a array of strings and can be JSON encoded to a single string, i.e., where Category,1 and Category,2 are single categories: '["Category,1","Category,2"]'
    • In the Response::parse_response() method:
      • Call the parse_custom_dimensions() method ensuring the $response object is modified correctly.

Test Coverage

  • Within tests/phpunit/integration/Modules/Analytics_4/ResponseTest.php:
    • Add PHPUnit tests for all the new methods being created.
    • Test the modification of $response by adding further cases to the test_parse_response method.

QA Brief

  • Setup Site Kit and connect a GA4 property that has some data for the customEvent:googlesitekit_post_author and customEvent:googlesitekit_post_categories custom dimensions. You may use news-key-metrics.instawp.xyz. Ask @nfmohit or @jimmymadon to grant you access to this domain's GA4 account/property/web datastream. NOTE: If you are connecting a GA4 property which has a different domain to the one you are currently using, then ensure the "Custom Site URL" within the Tester Plugin is set to the domain associated with the GA4 property, i.e. news-key-metrics.instawp.xyz in our case.
  • In the browser console, run the following requests which fetch screenPageViews for the custom dimensions in the ACs. These requests will eventually be run in the plugin in Create “Top categories by pageviews” key metric widget tile #7607 and Create “Most popular authors by pageviews” key metric widget tile #7605.
    • Post author:
    await googlesitekit.api.get('modules', 'analytics-4', 'report', {
        metrics: [
            'screenPageViews'
        ],
        dimensions: [
            'customEvent:googlesitekit_post_author'
        ]
    });
    • Post categories:
    await googlesitekit.api.get('modules', 'analytics-4', 'report', {
        metrics: [
            'screenPageViews'
        ],
        dimensions: [
            'customEvent:googlesitekit_post_categories'
        ]
    });
  • Verify the response is as per the AC, i.e. the dimensionValues within rows show display names where a display name is found, otherwise, the original ID is passed through for now.

Changelog entry

  • N/A
@nfmohit nfmohit added Type: Enhancement Improvement of an existing feature P0 High priority labels Sep 20, 2023
@nfmohit nfmohit assigned nfmohit and unassigned nfmohit Sep 25, 2023
@techanvil techanvil self-assigned this Sep 26, 2023
@techanvil
Copy link
Collaborator

Hey @nfmohit, this AC is looking good. One detail I think would be useful to specify here is how the parsed value should be returned in the response.

That is to say, I would imagine that we need to simply return the parsed value in the report in place of the dimension value returned by GA4 - trying to add a extra value to the report would result in an unexpected report shape. It would be good to be clear about that here in the AC though, just for the avoidance of doubt.

@techanvil techanvil assigned nfmohit and unassigned techanvil Sep 26, 2023
@nfmohit
Copy link
Collaborator Author

nfmohit commented Sep 27, 2023

Hey @nfmohit, this AC is looking good. One detail I think would be useful to specify here is how the parsed value should be returned in the response.

That is to say, I would imagine that we need to simply return the parsed value in the report in place of the dimension value returned by GA4 - trying to add a extra value to the report would result in an unexpected report shape. It would be good to be clear about that here in the AC though, just for the avoidance of doubt.

Thank you for the kind review, @techanvil. I've updated the ACs to be more precise. Please let me know if this looks good.

@nfmohit nfmohit assigned techanvil and unassigned nfmohit Sep 27, 2023
@techanvil
Copy link
Collaborator

That's great, thanks @nfmohit!

AC ✅

@techanvil techanvil removed their assignment Sep 27, 2023
@jimmymadon jimmymadon assigned jimmymadon and unassigned jimmymadon Sep 27, 2023
@aaemnnosttv aaemnnosttv added the PHP label Oct 3, 2023
@techanvil techanvil self-assigned this Oct 4, 2023
@techanvil
Copy link
Collaborator

Hi @jimmymadon, thanks for drafting this IB. I have a few thoughts - one of which is really a broader AC/design level point.

  • First the minor method-naming related points:
    • I'd suggest renaming the parse_ methods along these lines. To me, these aren't really doing much in the way of parsing, and are in fact just straightforward lookups. Also, there doesn't seem much need to include the specificity of custom_dimension in the name, they are each performing a fairly generic function and keeping the names more generic would lend them to easier lifting out/reusing later. They'll still be called from within a custom dimension-named method so it should be clear enough what's going on here.
      • parse_custom_dimension_post_author -> get_author_display_name
      • parse_custom_dimension_post_categories -> get_category_names
    • Funnily enough though, I think the filter_custom_dimensions method would be better named parse_custom_dimensions, (or map something, but parse would be more consistent within the class). I know that within WP, "filters" can modify the value they return, but this isn't a filter hook, and outside of this when I see something called filter_x I do still tend to think of the filter and map operation types. YMMV but parse_ would read more easily to me here.
  • Last but certainly not least (and apologies as I should really have called this out at the AC review stage), there is the issue of how to deal with category names that include a comma when we're creating our comma-separated list. We'd need to encode/escape the comma somehow, or quote the name - but then we'd need to deal with these additional characters in the names too :) It's what an off-the-shelf CSV parser/formatter would handle, but we don't have any CSV utilities available to us in the codebase. So, it's making me think we'd actually be better off encoding this as a JSON-stringified array of strings; this would obviously impact the definition of any issues where we're using these values on the frontend. What do you think? Cc @nfmohit.

@techanvil techanvil assigned jimmymadon and unassigned techanvil Oct 4, 2023
@nfmohit
Copy link
Collaborator Author

nfmohit commented Oct 4, 2023

a JSON-stringified array of strings

Sounds fancy! I think the only issue this change would affect is #7600.

@jimmymadon
Copy link
Collaborator

@techanvil Thanks for the helpful IB review as always. Everything makes better sense. Back to you for another pass!

@jimmymadon jimmymadon assigned techanvil and unassigned jimmymadon Oct 5, 2023
@techanvil
Copy link
Collaborator

Thanks @jimmymadon! This is looking good - one very minor detail, I had rather imagined that get_category_names() would still return an array, and the calling parse_custom_dimensions() would do the JSON stringification. I guess it doesn't really matter too much but that way get_category_names() is a bit more generic and potentially reusable.

That aside - Nahid mentioned that this change would affect #7600, fortunately this is already assigned to you, so I guess this is one to bear in mind there. I would have thought it would affect #7607 as well though, which is in the EB.

Actually while reading #7607 something else has occurred to me... I've realised it's possible for more than one category to have the same name (if it's created under a different parent category). So I'm not sure if we might be introducing a possible issue here by making the returned dimensions potentially non-unique... Do you know if this aspect has been considered? Cc @nfmohit

@techanvil techanvil assigned jimmymadon and unassigned techanvil Oct 5, 2023
@jimmymadon
Copy link
Collaborator

@techanvil - Have updated the IB regarding the JSON stringification.

As for the category ancestor naming, it is something we can do as a nice to have in the future as per this comment.

@jimmymadon jimmymadon assigned techanvil and unassigned jimmymadon Oct 5, 2023
@techanvil
Copy link
Collaborator

Thanks @jimmymadon, that looks great, and it seems fair to tackle the issue of duplicates later as it shouldn't cause us a problem as things stand.

IB ✅

@techanvil techanvil removed their assignment Oct 6, 2023
@jimmymadon jimmymadon self-assigned this Oct 8, 2023
@jimmymadon jimmymadon assigned aaemnnosttv and unassigned jimmymadon Oct 16, 2023
@jimmymadon jimmymadon assigned aaemnnosttv and unassigned jimmymadon Oct 16, 2023
@aaemnnosttv aaemnnosttv removed their assignment Oct 16, 2023
@wpdarren wpdarren self-assigned this Oct 17, 2023
@wpdarren
Copy link
Collaborator

QA Update: ⚠️

@jimmymadon I would like to double-check an observation with you before approving this.

The author looks fine. The correct value is appearing as expected. The value for the category though, I see 'uncategorized' which is correct, but there are also values 6 and 7 included. Is this expected? On my site, I only have one category, but maybe this is coming from your news test site, and you have additional categories there. Can you confirm this?

image

@jimmymadon
Copy link
Collaborator

@wpdarren Yes - the 6 and 7 IDs do exist on the News Key Metrics test site (I can verify this when I connect it locally). Because your WP database does not have actual categories for IDs 6 and 7, there is no category name found and we've currently decided to just pass through the original IDs themselves.

@jimmymadon jimmymadon removed their assignment Oct 17, 2023
@wpdarren
Copy link
Collaborator

QA Update: ✅

Verified:

  • The response is as per the AC, i.e. the dimension/values within rows show display names where a display name is found.
Screenshots

image
image

@wpdarren wpdarren removed their assignment Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 High priority PHP Type: Enhancement Improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

5 participants