Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle the special case for “new visitors” and “returning visitors” to retrieve their metrics for the Selection Panel #8160

Closed
7 tasks
techanvil opened this issue Jan 25, 2024 · 7 comments
Labels
Module: Analytics Google Analytics module related issues P1 Medium priority Team M Issues for Squad 2 Type: Enhancement Improvement of an existing feature

Comments

@techanvil
Copy link
Collaborator

techanvil commented Jan 25, 2024

Feature Description

Implement the special case for “new visitors” and “returning visitors” on the Selection Panel whereby their metrics are retrieved using the newVsReturning dimension instead of via the audience until the corresponding audiences are out of the "partial data" state.

See special case to avoid "partial data" state in the design doc.


Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • If either of the “new visitors” and “returning visitors” audiences are determined to be in the "partial data" state (see Add “partial data” states infrastructure #8141):
    • The Selection Panel user counts for “new visitors” and “returning visitors” should be retrieved via a report that uses the newVsReturning dimension rather than audienceResourceName.
    • As a result, these user counts should reflect the full data available for the property rather than the limited data available for the corresponding audiences.

Implementation Brief

Note: This issue may have some implementation shared with the related issue #8144.

  • Update assets/js/modules/analytics-4/components/audience-segmentation/dashboard/AudienceSelectionPanel/AudienceItems.js, availableAudiences useSelect:
    • Filter the audiences into two separate arrays, siteKitAudiences and userAudiences, using the isSiteKitAudience selector.
    • Check if either of the siteKitAudiences are in partial data state using the isAudiencePartialData selector, assigning the result to a variable isSiteKitAudiencePartialData:
      • If isSiteKitAudiencePartialData is true:
        • Run a newVsReturning dimension report using getReport, with the dates, the metric totalUsers and the dimensions newVsReturning. No need to pass any dimension filters.
        • Destructure this report's rows as : newVsReturningRows.
      • Update the return statement; continue to map over the full audience list in the audiences array, however, if isSiteKitAudiencePartialData is true, set userCount for the Site Kit audiences using the rows from the newVsReturning report. Ensure that rows where newVsReturning is (not set) are ignored, only taking values from rows which have the dimension value new or returning.

Test Coverage

  • Add a new story to assets/js/modules/analytics-4/components/audience-segmentation/dashboard/AudienceSelectionPanel/index.stories.js, for partial data state for newVsReturningAudiences.

QA Brief

  • Set up Site Kit with the audienceSegmentation feature flag enabled, and connect Analytics to a property which has data.
  • Click on Enable groups in the Audience Segmentation Setup CTA Banner to set up the feature - ensure the "New visitors" and "Returning visitors" audiences are created.
  • Click on Change groups to open the Audiences Selection Panel. The user counts for "New visitors" and "Returning visitors" should match the total user counts for property (i.e. the counts for the "new" and "returning" values of the newVsReturning dimension), rather than the zero user counts of the newly created audiences.

Note: When first landing on the dashboard after connecting Analytics, you may see a couple of failed requests to save-resource-data-availability-date. The fix for this is outside the scope of this issue, and there is a separate issue for it: #8888.

Changelog entry

  • Add a fallback mechanism to obtain user count for Site Kit audiences in the partial data state.
@techanvil techanvil added Module: Analytics Google Analytics module related issues P1 Medium priority Type: Enhancement Improvement of an existing feature labels Jan 25, 2024
@techanvil techanvil self-assigned this Mar 15, 2024
@techanvil techanvil removed their assignment Mar 15, 2024
@ivonac4 ivonac4 added the Team M Issues for Squad 2 label Apr 9, 2024
@benbowler benbowler assigned benbowler and unassigned benbowler May 30, 2024
@techanvil techanvil self-assigned this May 31, 2024
@techanvil
Copy link
Collaborator Author

techanvil commented May 31, 2024

Thanks @benbowler, this IB is heading in the right direction. A few points:

@techanvil
Copy link
Collaborator Author

Thanks @benbowler. This IB LGTM ✅

Please note, I made a small amendment to make it clear we want to ignore rows where the newVsReturning dimension is (not set). I've also bumped the estimate to an 11, as I think testing could potentially be a bit time consuming, and arguably takes this over a "legit" 3.

@kelvinballoo
Copy link
Collaborator

kelvinballoo commented Jun 26, 2024

QA update ⚠️

Hi @techanvil cc: @wpdarren @nfmohit , I have a list of questions around this ticket which I hope you can help out with:

QUESTION 1:
For the second point of the QAB:
Click on Enable groups in the Audience Segmentation Setup CTA Banner to set up the feature - ensure the "New visitors" and "Returning visitors" audiences are created.

Do I need to use a property which doesn't have 'New visitors' and 'Returning visitors' at all in analytics for SK to create them or it's perfectly fine to use one which has them as existing? Would it make a difference?
Due to limited data, I'm using one which has those audiences as existing.


QUESTION 2:
Based on my understanding from the design doc, those 2 new audiences 'New' and 'Returning' are special audiences which are avoiding partial data state. The conflicting thing for me is that currently, the tiles are showing 'Partial data'. I assume these tags should not show up?

Screenshot 2024-06-26 at 16 30 06

QUESTION 3:
If you can see from the previous screenshot, the 2nd tile has no data. This is a known issue and I raised a ticket under : #8921
I am not sure when this is going to be fixed but does it make sense to fix it together for this issue?


QUESTION 4:
The tile data isn't matching that of the selection panel.
Refer to the screenshot below: New visitors has 6.4K visitors on the selection panel but only 1.7K on the tile.
If I amend the timeframe on the SK dashboard, only the selection panel data will change and not the tile data.
I assume we have to fix this.

Screenshot 2024-06-26 at 16 35 33

QUESTION 5:
I've set the SK dashboard to 'Last 28 days' and applied the same timeframe for the Analytics dashboard but the numbers don't match.
Refer to the screenshots below: e.g. Returning visitors is 199 on analytics but on selection panel, it's 692.

Screenshot 2024-06-26 at 16 38 52 Screenshot 2024-06-26 at 16 38 43

QUESTION 6:
The math doesn't add up in the selection panel.
I was assuming that New + Returning = All visitors.

But if I do 692 + 6400, it's closer to 7000 rather than 6500.
I understand figures like 6.4K could be rounded but let's say it's essentially at 6300 for a worst case. If we add 692 to that, it's still around 7000 and not 6.5K.

Screenshot 2024-06-26 at 16 42 21

Let me know if you need access to my test site for further investigation.
I am also thinking if the QAB needs updating based on the questions above.

@techanvil
Copy link
Collaborator Author

techanvil commented Jun 27, 2024

Thanks @kelvinballoo! I will try to clear things up for you.

QUESTION 1: For the second point of the QAB: Click on Enable groups in the Audience Segmentation Setup CTA Banner to set up the feature - ensure the "New visitors" and "Returning visitors" audiences are created.

Do I need to use a property which doesn't have 'New visitors' and 'Returning visitors' at all in analytics for SK to create them or it's perfectly fine to use one which has them as existing? Would it make a difference? Due to limited data, I'm using one which has those audiences as existing.

It's fine to use a property which already has the new/returning visitor audiences set up, the main thing is to ensure they are in the partial data state so you can see a difference in the figures between the actual Audiences in GA4 (i.e. via reports using the audienceResourceName dimension) vs the figures we display in SK for this special case (via a report using the newVsReturning dimension).

Of course if you already have them created the audiences probably won't have zero user counts as mentioned in the AC - I just specified the QAB this way for simplicity and to accentuate the difference between the actual audience figures vs the special case figures.

QUESTION 2: Based on my understanding from the design doc, those 2 new audiences 'New' and 'Returning' are special audiences which are avoiding partial data state. The conflicting thing for me is that currently, the tiles are showing 'Partial data'. I assume these tags should not show up?

Please bear in mind this issue only handles the metrics in the Selection Panel. The Audience Tiles will still show the actual audience data and the partial data badge until issue #8144 is implemented.

QUESTION 3: If you can see from the previous screenshot, the 2nd tile has no data. This is a known issue and I raised a ticket under : #8921 I am not sure when this is going to be fixed but does it make sense to fix it together for this issue?

This is out of scope for this issue as per the above. However, thanks for raising the bug - it's important to fix, I've added AC and moved it to the IB column.

QUESTION 4: The tile data isn't matching that of the selection panel. Refer to the screenshot below: New visitors has 6.4K visitors on the selection panel but only 1.7K on the tile. If I amend the timeframe on the SK dashboard, only the selection panel data will change and not the tile data. I assume we have to fix this.

As per the above the tiles are out of scope here and will be updated via issue #8144. This difference is expected for now.

QUESTION 5: I've set the SK dashboard to 'Last 28 days' and applied the same timeframe for the Analytics dashboard but the numbers don't match. Refer to the screenshots below: e.g. Returning visitors is 199 on analytics but on selection panel, it's 692.

You are comparing the wrong data. When the real audiences are the in partial data state and the special case handling kicks in, we retrieve the data for these "audiences" via a report that uses newVsReturning dimension rather than audienceResourceName. Using audienceResourceName means we'll get data associated with the Audiences as you see them in GA4. However using newVsReturning is a completely separate thing. You can retrieve this data in a report using that dimension or see it in the Retention report on GA4:

image

QUESTION 6: The math doesn't add up in the selection panel. I was assuming that New + Returning = All visitors.

But if I do 692 + 6400, it's closer to 7000 rather than 6500. I understand figures like 6.4K could be rounded but let's say it's essentially at 6300 for a worst case. If we add 692 to that, it's still around 7000 and not 6.5K.

This catches everyone out at first! It is not the case that new + returning = all visitors. In fact, visitors can be new within a timeframe, and then visit the site again and be flagged as a returning visitor as well.

Here is a Slack thread where this is discussed which also links out to this support query. It's also mentioned on the design doc, and is the reason we have the new/returning audience tooltips text e.g.:

image

Hope that clarifies things, please let me know if you have any further questions!

@techanvil techanvil removed their assignment Jun 27, 2024
@kelvinballoo
Copy link
Collaborator

QA update ⚠️

Thanks for answering the questions @techanvil . Noted on all the points and they make sense.
Would be testing the relevant points when the other tickets are ready.

As for this ticket, it feels like the main thing is to check the selection panel figures against the retention report.

I noted the returning visitors was 691 on the audience selection panel.
However, in the retention report, it's 696.

I believe this is most likely a time gap issue about when we are retrieving the data and at what frequency we retrieve it again. Do you feel there is any gap to be filled here? Or we have plans to update the logic for the retrieval in another ticket?
Happy to create a ticket, if there is a need.

Screenshot 2024-06-27 at 15 07 29 Screenshot 2024-06-27 at 15 07 38

@techanvil
Copy link
Collaborator Author

Thanks @kelvinballoo. It's unlikely to be a time gap issue as the end date for our reports is yesterday rather than the current day.

It's more likely these differences can be explained by the last point in my comment on #7214.

Essentially we should expect some minor discrepancies when comparing API reports to GA4 UI Reports and Explorations.

@kelvinballoo
Copy link
Collaborator

QA Update ✅

Thanks for clarifying @techanvil .
Noted on that.
With that, I'm marking this ticket as a pass as some minor discrepancies are expected and it isn't a wide gap anyway.

The selection panel for 'New visitors' and 'Returning visitors' are pulling accordingly from the GA retention report, albeit with a minor discrepancy for the 'Returning visitors', which is expected.

Moving ticket to approval.

Screenshot 2024-06-27 at 15 07 29 Screenshot 2024-06-27 at 15 07 38

@kelvinballoo kelvinballoo removed their assignment Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Module: Analytics Google Analytics module related issues P1 Medium priority Team M Issues for Squad 2 Type: Enhancement Improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

7 participants