Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amp-analytics reports wrong cid when multiple tags present #25525

Closed
mattwomple opened this issue Nov 9, 2019 · 13 comments · Fixed by #26130
Closed

amp-analytics reports wrong cid when multiple tags present #25525

mattwomple opened this issue Nov 9, 2019 · 13 comments · Fixed by #26130

Comments

@mattwomple
Copy link
Member

What's the issue?

Session stitching from non-AMP to AMP pages breaks if the AMP page contains multiple <amp-analytics> tags and the first tag uses CLIENT_ID() to set a variable.

How do we reproduce the issue?

Create a non-AMP page that uses Google Analytics via analytics.js.

Create an AMP page with <amp-analytics> utilizing CLIENT_ID() to set a variable, e.g.
"vars": { "vid": "CLIENT_ID(AMP_ECID_GOOGLE)" }. Append <amp-analytics type="googleanalytics"> just after the first <amp-analytics> tag.

  1. Visit the canonical page in a clean browser session.
  2. Check the https://www.google-analytics.com/r/collect beacon for the cid reported.
  3. Visit the AMP page.
  4. Check the https://www.google-analytics.com/r/collect beacon for the cid reported.

Here is a minimal example the demonstrates the issue.

  1. Clear all site data for cmphys.com
  2. Visit android.cmphys.com/temp/analytics-stitching-canonical.html
  3. Note the cid set in cookie _gl and sent in beacon www.google-analytics.com/r/collect
  4. Click the link "Visit the AMP page with <amp-analytics type="googleanalytics"> last"
  5. Check the new cid sent in beacon www.google-analytics.com/r/collect

The order of the <amp-analytics> scripts changes the outcome. If you put <amp-analytics type="googleanalytics"> before the custom <amp-analytics> tag, the correct cid is used in the googleanalytics beacon. This can be tested in the sample website above by clicking "Visit the AMP page with <amp-analytics type="googleanalytics"> first" from the non-AMP page.

What browsers are affected?

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36

Which AMP version is affected?

1911062056110

@mattwomple
Copy link
Member Author

/cc @ampproject/wg-analytics

@zhouyx
Copy link
Contributor

zhouyx commented Nov 14, 2019

Thanks for reporting. That definitely sounds like a bug to me. @micajuine-ho let's dig deeper into this after the CCPA work.

@mbrowne
Copy link

mbrowne commented Nov 26, 2019

Does it still happen when using the amp-analytics variable instead of the platform variable, e.g.:

{ "vid": "${clientId(AMP_ECID_GOOGLE)}" }

(...assuming that I'm reading the docs correctly and this is valid syntax in the first place)

@mattwomple
Copy link
Member Author

Sorry for the delay @mbrowne . I just confirmed that the issue still exists when using { "vid": "${clientId(AMP_ECID_GOOGLE)}" }. The cid is incorrect the first time the AMP page loads (a new cid like amp-abcabcabc is generated), and then is correct on subsequent visits to the AMP page (the cid generated by the non-AMP page is used).

@micajuine-ho
Copy link
Contributor

I am investigating now. We think it might have something to do with the caching implementation of scope client id. Will update when I make progress

@micajuine-ho
Copy link
Contributor

Hi @mattwomple, it seems that GA creates a _ga cookie, but then the custom amp-analytics tag creates a scope cookie through CLIENT_ID(AMP_ECID_GOOGLE). Then the googleanalytics analytics tag searches for the _ga cookie, but falls back to our amp provided scope cookie.

So both the custom analytics and googleanalytics tag return the amp provided scope cookie.

An easy way to fix this would be in the custom config change CLIENT_ID(AMP_ECID_GOOGLE) to CLIENT_ID(AMP_ECID_GOOGLE,,_ga) so that it will use the _ga cookie.

What do you think about this?

@mattwomple
Copy link
Member Author

@micajuine-ho Thanks for the analysis. Based on your suggestion, I tested the following vid values in the custom amp-analytics config:

  • CLIENT_ID(AMP_ECID_GOOGLE) - cid is incorrect in Google beacon and matches vid in custom beacon
  • CLIENT_ID(AMP_ECID_GOOGLE,,_ga) - cid is correct in Google beacon and matches vid in custom beacon
  • CLIENT_ID(abc) - cid is correct in Google beacon and distinct from vid in custom beacon
  • CLIENT_ID(abc,,_ga) - cid is correct in Google beacon and matches vid in custom beacon

I'll start with: Thanks, I appreciate the workaround. I can share the _ga cookie in my application. I still consider this a workaround though.

Then the googleanalytics analytics tag searches for the _ga cookie, but falls back to our amp provided scope cookie.

Isn't the behavior of CLIENT_ID(x,y,z): 1. search for cid scope definition x, and then if not available, 2. search for value in cookie z? In this case, I can see why Google Analytics might short-circuit the cid lookup from _ga since AMP_ECID_GOOGLE had already been generated by the custom amp config. But, this leaves me a little confused about the inconsistent results when visiting the AMP page on the first visit and on the second visit. Using the sample site in my original post:

  1. Visit https://android.cmphys.com/temp/analytics-stitching-canonical.html to set _ga
  2. Click Visit the AMP page with <amp-analytics type="googleanalytics"> last and note the cid sent to google-analytics.com/r/collect is new (starts with amp-)
  3. Click Visit the non-AMP page
  4. Click Visit the AMP page with <amp-analytics type="googleanalytics"> last and note the cid sent to google-analytics.com/r/collect is the same as in Add commit hooks #1 (does not start with amp-)

These inconsistent results when visiting the AMP page suggest there is still something not behaving correctly. Do you agree?

@micajuine-ho
Copy link
Contributor

@mattwomple

Isn't the behavior of CLIENT_ID(x,y,z): 1. search for cid scope definition x, and then if not available, 2. search for value in cookie z?

It actually finds the cookie value z first if present, and then returns the scope value if present. Just a nuance, so essentially the same as you outlined.

These inconsistent results when visiting the AMP page suggest there is still something not behaving correctly. Do you agree?

On the first visit to Visit the AMP page with last:

  1. The custom analytics event is triggered, and it searches for AMP_ECID_GOOGLE, but it does not exist in the browser cache and so it gets created with amp-..., and stored in cache and memory
  2. googleanalytics finds the _ga browser cookie, but then uses the AMP_ECID_GOOGLE cookie stored in memory because it's the same scope

On the second visit:

  1. AMP_ECID_GOOGLE and _ga both exist in browser cache but the in-memory cache is empty.
  2. The custom trigger uses AMP_ECID_GOOGLE and finds the amp-... cookie stored in the browser from earlier and returns it
  3. googleanalytics finds the _ga browser cookie. But this time when it checks the in-memory cache, the scope does not exist, and so it doesn't fallback to the amp-...., and instead uses the correct GA.....

I'm not sure if this is the correct behavior but this it why it's happening. I've been told that the optional cookiename parameter was added for backwards compatibility sake, and that this final design is not the most optimal.

@mattwomple
Copy link
Member Author

@micajuine-ho Thanks for taking the time to spell out the actions taken by amp-analytics on each page load. This is super helpful and will be a great reference for me to refer back to. This code block explains a lot:

if (existingCookie) {
// If we created the cookie, update it's expiration time.
if (/^amp-/.test(existingCookie)) {
setCidCookie(win, cookieName, existingCookie);
}
return /** @type {!Promise<?string>} */ (Promise.resolve(existingCookie));
}

cid.externalCidCache_[scope] is not defined if the cookie exists. So, as you mentioned, each call to CLIENT_ID() returns a distinct value if distinct cookies exist, even if their scope names match.

I can see a few options for improvement. Let me know if you think any of these are worth consideration:

  • cid.externalCidCache_[scope] could be set in this block. That would at least make the cid used in each analytics config consistent on each visit to a page, whether the cookie exists or not. This means that distinct cookies names would be ignored, though. The cookie name associated with the first use of a scope name would be used in all subsequent configs.
  • cid.externalCidCache_[scope] could be set in this block and the condition in this block could be changed from
    if (cid.externalCidCache_[scope]) {
    to
    if (cid.externalCidCache_[scope] && existingCookie) {
    This would allow distinct cid values for matching scopes if the cookie names differ. This may or may not be desirable from the AMP team's perspective, but it actually matches today's behavior with the exception of first page load.
  • cid scope name AMP_ECID_GOOGLE becomes reserved and tied to cookie name _ga. Documentation is updated emphasize that a single scope name must only be used with one cookie name, or unexpected behavior could occur.

@micajuine-ho
Copy link
Contributor

micajuine-ho commented Dec 19, 2019

@mattwomple I think that the third option is the way to go, as it involves no changes to the runtime, which could have unintended side effects to existing users.

Additionally, looking through the our list of vendor configs, there are only 4 analytics vendors who use the macro CLIENT_ID(AMP_ECID_GOOGLE,,_ga) (including googleanalytics and gtag). No other vendors use the the optional cookie name parameter. To me this indicates that it is the intended behavior for these vendors to use the _ga cookie created by googleanalytics.

@zhouyx What are your thoughts on how to proceed?

@zhouyx
Copy link
Contributor

zhouyx commented Dec 20, 2019

Read the thread. Thank you both for the detailed explanation.

Can I summarize the issue to how should CLIENT_ID(scope, , opt_cookieName) behave.

Current order:
cachedCid with same scope generated by other tag > cid from opt_cookieName > cid generated from scope

This caused inconsistency with different tag orders and even with multiple page load.

My understanding is that the cachedCid is a must because we need to keep the value same for all. And then we want to stick to the existing cookie if possible.
The order works correctly as long as all tags pair the AMP_ECID_GOOGLE along with _ga, which in my opinion should always be the case if one wants to use the GA cid.

@lannka Do you recall why we need a scope and an opt_cookieName

Also add @zikas for ideas.

@zhouyx
Copy link
Contributor

zhouyx commented Dec 20, 2019

CID API accepts two scopes. You can think of them as a required off-origin cookie scope (AMP cache), and an optional origin cookie scope.

We started with only one cookie scope, which is the AMP_ECID_GOOGLE here. The optional origin cookie scope was introduced because AMP wants to leverage the existing cookie on the origin (You example also showed this). When the optional origin cookie scope is not defined, the required scope will be used to read and store cookie on the origin instead.

In the GA case, they provide both the AMP_ECID_GOOGLE and _ga because of two reasons.

  1. _ga is the cookie name for all non AMP pages, it would be impossible to change _ga to AMP_ECID_GOOGLE. Even we do so, we’d lost the benefit of getting existing client id on publishers origin.
  2. Changing AMP_ECID_GOOGLE to _ga as you proposed is a good option. However the AMP team didn’t proceed with this because AMP_ECID_GOOGLE had been used for quite some time. Switching to a new scope name meant we would lose all the existing data, and would cause a one time reset of every publishers GA dashboard.
    Due to these same reasons, we are not planning to change the scope name today.

Since we’ve agreed that the provided optional cookie name should override the scope value. I think it makes sense to always return the value from opt_cookieName if it exists.

But still, in my understanding the scope and opt_cookieName come as a set, and should always be set together if one wants to leverage the id from a specific vendor.

@mattwomple
Copy link
Member Author

mattwomple commented Dec 21, 2019

PR #26130 is a reasonable approach to this problem. Cookie first, scope second, else generate new cookie and scope. From my earlier comment, the updated behavior would be:

  • CLIENT_ID(AMP_ECID_GOOGLE) - cid is correct in Google beacon and distinct from vid in custom beacon
  • CLIENT_ID(AMP_ECID_GOOGLE,,_ga) - cid is correct in Google beacon and matches vid in custom beacon
  • CLIENT_ID(abc) - cid is correct in Google beacon and distinct from vid in custom beacon
  • CLIENT_ID(abc,,_ga) - cid is correct in Google beacon and matches vid in custom beacon

Importantly, these results are consistent in both analytics-stitching-amp-googlelast.html and analytics-stitching-amp-googlefirst.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants