Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose gathering data state on page load #5933

Closed
aaemnnosttv opened this issue Sep 27, 2022 · 19 comments
Closed

Expose gathering data state on page load #5933

aaemnnosttv opened this issue Sep 27, 2022 · 19 comments
Labels
Exp: SP P1 Medium priority PHP Type: Enhancement Improvement of an existing feature

Comments

@aaemnnosttv
Copy link
Collaborator

aaemnnosttv commented Sep 27, 2022

Feature Description

Until now, the concept of gathering data is something that we've calculated at runtime based on reporting data. In some cases, such as determining an initial layout it can be necessary to know whether a module is gathering data or not right away in order to avoid substantial changes as data is loaded.

While the goal is to make reporting data available right away on page load, we should do so in a way which keeps the fetching of such data async – that is, we should not be making API requests to query the gathering data state from the request serving the page itself.

Ideally this state would be available for the first page load once after module is connected, but if we can't, it should be acceptable to have the gathering state immediately available for the next page load.

As part of this issue, it would be good to revisit the concept of gathering data itself which is generally a state that a module should only be in once – regardless of how long – for a given configured entity. E.g. If a module is connected and then we detect it is no longer gathering data, it shouldn't be possible for that module to go back into a gathering data state again unless it's connected entity (i.e. data source) changed. With that in mind, every eligible module would always start in a gathering state. As soon as we're able to determine that the module is no longer in a gathering state, this would be persisted, after which we would no longer need to check. If the connection were later changed, the gathering state would be reset and the same process would repeat. This approach would be more efficient than the current implementation which only derives gathering state from requested report data (cached or freshly requested), so these queries are repeated every time the cache expires even after we detect the module is no longer gathering data.

This issue is relevant for the upcoming work for the Summary Widget (#5908) but not explicitly part of that epic.


Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • The "Gathering Data" state for Analytics and Search Console should be enhanced with a persistent server-side value which is available to the client on load
    • The server-side value should be implemented as a non-expiring Transient (googlesitekit_{$module_slug}_data_available), as the state itself is easily recalculatable)
      Note: this was amended before the release to expire after 1 hour, to be addressed in Potentially incorrect data available state being persisted on temporary API error #6698
    • The stored value should be a simple flag that is set when data for the module is available (i.e. not-gathering)
      • There is no need to store a value when data is in a gathering state, this is only useful to know ahead of time that there already is data
    • The derivation of the gathering data state should still be determined on the client as it is today
  • The module's data availability transient should be deleted any time the connected entity changes (see below), the module is disconnected, or Site Kit is reset. Relevant entities are:
    • Search Console property
    • GA3 profile
    • GA4 measurement ID
  • A new POST:data-available module datapoint should be added for modules that expose the state
    • This will be called to persist the available state. There is no particular parameter needed here because it should only set that data is available via the *_data_available transient
    • When data is gathering (or unknown), the transient should not exist
  • The server-side data availability state should be selectable by a new isDataAvailableOnLoad selector which should always reflect the value on page load – it should not update during the page lifecycle under normal circumstances, i.e. if data is determined to be available
    • Since this value will be available on load, it should be set in the store's initialState – it shouldn't be received by a resolver. For testing purposes, it could have a respective action to receive it though if useful.
  • isGatheringData selectors should receive corresponding state and resolvers. Selectors should be updated to return the value from state similar to other simple getters
    • A new action will be needed to receiveIsGatheringData(bool) (this can be used to simplify stories and tests for this state quite a bit but this should happen in a follow up issue)
    • In the new resolver:
      • If isDataAvailableOnLoad is true, the state should be received as false and do nothing else
      • If isDataAvailableOnLoad is false, it should select the same report it does today, and calculate the gathering state
        • If isGatheringData === true, the state should receive true and do nothing else
        • If isGatheringData === false, the state should receive false AND dispatch a fetch request to persist the data availability via the new module-specific REST route POST:modules/:identifier/data/data-available
          Note that this request should only be made when data was not available on load, and was then detected to be available

Note: this may introduce failures in E2E tests due to new/unhandled fetch requests when data is available.

Implementation Brief

On PHP Side

In Google\Site_Kit\Core\Modules/Module class.

  • Add a protected $transients instance variable.
    • Populate it in the constructor with a Google\Site_Kit\Core\Storage\Transients instance.

Create Module_With_Data_Available_State interface in Google\Site_Kit\Core\Modules namespace.

  • It will have the following public methods:
    • is_data_available - returns whether the data_available transient is set.
    • set_data_available - sets the data_available transient.
    • reset_data_available - resets the data_available transient by deleting it.

Create Module_With_Data_Available_State_Trait trait in Google\Site_Kit\Core\Modules namespace.

  • Add a protected get_data_available_transient_name method:
    • It will return googlesitekit_{$module_slug}_data_available, replacing {$module_slug} with $this-slug.
  • Implement is_data_available method.
    • It should return the result of calling get method on $this-transients with get_data_available_transient_name() as the first argument.
  • Implement set_data_available method.
    • It should call set method on $this-transients with get_data_available_transient_name() as the first argument and true as the second argument.
  • Implement reset_data_available method.
    • It should call delete method on $this-transients with get_data_available_transient_name() as the first argument.

In Google\Site_Kit\Core\Assets\Assets:

  • Add a new private get_inline_modules_data method.
    • It should apply googlesitekit_inline_modules_data filter to an empty array and return it.
  • In the get_assets method:
    • Add a new Script_Data asset in the $assets array with googlesitekit-modules-data handle with the following args:
      • global : _googlesitekitModulesData.
      • data_callback: it should be a function that returns the result of $this->get_inline_modules_data().

In Google\Site_Kit\Core\Modules\Modules:

  • Add a new private inline_modules_data method:
    • Get all the active module using $this->get_active_modules().
    • Filter out the active modules with that are instance of Module_With_Data_Available_State.
    • Return an array with data_available_{$module->slug} as key and $module->is_data_available as value for the filtered modules.
  • in the register method:
    • Add the data to the googlesitekit-modules-data Data Script using the googlesitekit_inline_modules_data filter and inline_modules_data method.

In Google\Site_Kit\Modules\Search_Console, Google\Site_Kit\Modules\Analytics and Google\Site_Kit\Modules\Analytics_4 class:

  • Add Module_With_Data_Available_State interface and Module_With_Data_Available_State_Trait
  • In setup_assets method:
    • Add the new googlesitekit-modules-data as a dependency of the modules asset.
  • In register method:
    • Add a update_option_googlesitekit_{search-console/analytics/analytics-4}_settings action to compare the current and new search console property/ga3 property/ga4 measurement id and call reset_data_available method if they are different.
  • In get_datapoint_definitions method:
    • Add POST:data-available definition.
  • In create_data_request method:
    • Call set_data_available method to set the data_available transient.

In Google\Site_Kit\Modules\Analytics and Google\Site_Kit\Modules\Analytics_4 classes:

  • In addition to the things done in the above steps, also add the following:
  • In on_deactivation method:
    • Call reset_data_available method to reset the data_available transient.

On JS Side

In assets/js/modules/search-console/datastore/report.js and assets/js/modules/analytics/datastore/report.js:

  • Add the serverside data available state from the global _googlesitekitModulesData to the initialState object.
  • Add a new selector isDataAvailableOnLoad which returns the value of dataAvailableOnLoad from the state.
  • Update the isGatheringData selectors according to the details provided in the AC, notably:
    • It should have a state and a resolver. The resolver should do the following:
    • If isDataAvailableOnLoad is true, then isGatheringData should return false.
    • Otherwise, it should determine the gatheringDataState by similar means as it does currently.
      • If gatheringDataState is true, then isGatheringData should return true and do nothing else.
      • Otherwise, call the the newly added endpoint POST:data-available using API.set() and reurn false.

Test Coverage

  • Add contract tests for Module_With_Data_Available_State interface.
  • Add unit test to verify the data_available state is reset when relevant entity changes and module is deactivated.
  • Add unit test for the newly added selectors.
  • Update existing tests for isGatheringData to accommodate the changes.
  • Fix any failing Unit or e2e tests.

QA Brief

This is a big one and the following needs to be thoroughly tested. Although not strictly QA:Eng, an engineers' help may be needed.

  • Test and make sure widget status and banners for sites in gathering data and zero data continues to work as before and there's no regression.
    • This should be checked for Search Console, Analytics and Analytics 4.
  • Check the newly introduced selector isDataAvailableOnLoad using the following browser console commands.
    • For Search Console: googlesitekit.data.select('modules/search-console').isDataAvailableOnLoad()
    • For Analytics: googlesitekit.data.select('modules/analytics').isDataAvailableOnLoad()
    • For Analytics 4: googlesitekit.data.select('modules/analytics-4').isDataAvailableOnLoad()
    • For any site:
      • Running this selectors on first page load after setup should return false.
    • For Sites in gathering data:
      • It will continue to return false.
    • For sites that have data (including zero data):
      • Observe the network tab of developer tools after setting up and returning to dashboard. a POST request should be made to data-available endpoint once (for each module that is not in a gathering state).
      • From next page refresh, the selector should return true.
      • Caveat for GA4:
        • Until Conditionally render GA4 metrics on SK dashboards #6220 is merged, are not using any GA4 widget in SK Dashboard. So, the isGatheringData selector for GA4 module must be called manually on first page load to kickstart the isDataAvailableOnLoad using the following snippet.
        • googlesitekit.data.select('modules/analytics-4').isGatheringData();
        • If the ga4Reporting feature flag is not enabled, the GET:report API call will fail and the isGatheringData will return false. So, for a more authentic experience, enable the feature flag. That way, the data available will only be persisted for sites with actual data on GA4 property.
    • Changing the property id/measurement id should make the selector return false first time, and behave accordingly (as mentioned above) depending on the gathering data state of new property.
    • Deactivating and reactivating module will also reset the value to false and do the same as above.

Changelog entry

  • Expose gathering data state on page load.
@aaemnnosttv aaemnnosttv added P1 Medium priority Type: Enhancement Improvement of an existing feature labels Sep 27, 2022
@felixarntz
Copy link
Member

@aaemnnosttv I've added ACs here, what do you think? I think the approach of using a transient for the temporary value vs an option for the persistent value might make sense?

@felixarntz felixarntz removed their assignment Sep 28, 2022
@aaemnnosttv
Copy link
Collaborator Author

@felixarntz I've revised the AC a bit after giving this much thought. It's slightly different than we discussed recently but should be the smallest change I think. For cases where we want to know immediately if data is available, we will have a dedicated selector to use in those situations. Otherwise the existing selector will work mostly the same as today but a bit more efficiently since it won't request the report data any more once data is detected to be available.

This is also rather technically detailed so there likely isn't a lot more detail to add in an IB although things like outlining how we'll clear the transient on an entity change or how the state will be passed from the server would be good to cover.

Let me know what you think?

@FlicHollis
Copy link
Collaborator

I have added this to the User Input v2 epic, as although it is not directly related to the epic, it does block 5895 and needs to be completed for the epic to be completed.

@eclarke1
Copy link
Collaborator

eclarke1 commented Nov 3, 2022

@felixarntz please could I chase for your review here so we can get this one moving? It is blocking several other issues so would be great to please get it over to EB ASAP. Thank you

@felixarntz
Copy link
Member

ACs ✅

@felixarntz felixarntz removed their assignment Nov 3, 2022
@eclarke1
Copy link
Collaborator

@kuasha420 could we please progress the IB here as it would be ideal to get this into Sprint 90 as the last issue remaining (other than #6180 which is just for the HaTS survey)
cc @FlicHollis

@aaemnnosttv
Copy link
Collaborator Author

Create an interface Module_With_Data_Available_State in Google\Site_Kit\Core\Modules namespace.

  • It will have the following public methods:

@kuasha420 I like the direction here 👍 I noticed you only mentioned the interface and then implement it in each of the module classes. This is why many Module_With_ interfaces have corresponding traits where the implementation is the same. I think a trait would help reduce a lot of the duplication here as well.

  • setup_transients - sets the $transients instance variable.

As a "generic" method name, I would expect this to be on a more generic interface. It probably shouldn't be part of the interface though as this is more of an implementation detail. It might make sense to include on the corresponding trait. Rather than setting the transients, it might be cleaner to have a method to get the Transients instance instead which would be used from the other methods.

  • delete_data_available - deletes the data_available transient.

How about reset_data_available instead? Deleting is leaking some unnecessary implementation detail. The interface should be relatively independent in that regard.

  • Add DATA_AVAILABLE_TRANSIENT constant with value defined in the AC.

Since this has to be dynamic to each module, I would suggest using a protected method for this instead which can live on the trait and would use $this->slug for the module slug.

  • Call is_data_available method to check if the data_available transient is set and append the information to the googlesitekitBaseData using googlesitekit_inline_base_data filter.

I'm not sure if googlesitekitBaseData is the right place for this state but I can see how it's the most appropriate from the current options. How about we create a new script data asset like _googlesitekitModulesData which we could then add as a dependency to each module's asset, that way it would only be loaded when a dependent module's asset is enqueued. The object's data itself could be populated by the Modules class which could iterate over each active module and if it implemented the necessary interface, add a key for data_available_{slug} set by the module's method. What do you think?

  • Add a googlesitekit_pre_save_settings_{module_slug} action to compare the current and new search console property/ga3 property/ga4 measurement id and call delete_data_available method if they are different.

I'm not sure why this hook (and the related googlesitekit_save_settings_{slug}) were added, core already has pre/post option update hooks and this one is specific to REST requests only which would be insufficient for the AC. These don't even seem to be used! 🤔 Ideally, I think we should have an action for when a module's connected entity changes as this is something we've needed in a few places now. That should happen in a separate issue though; I'll create a new issue for that.

In create_data_request and parse_data_response methods:

  • Call set_data_available method to set the data_available transient.

This would only be relevant in create_data_request. parse_data_response is only really used for API requests and alters the response, it shouldn't be used to persist data, etc.


Regarding tests, each interface has a corresponding set of contract tests which are defined in a trait that each module's test case uses. Let's do the same here.

Otherwise, nice work! – this is a rather complicated one! 👍

@kuasha420
Copy link
Contributor

Thank you @aaemnnosttv for the review and valuable feedback. I have updated the IB to address your feedback and suggestions. Please feel free to have a look. Cheers!

@aaemnnosttv
Copy link
Collaborator Author

Thanks @kuasha420 – apologies for the delay here – this is really close, the only thing I think there is to finalize here is around the internal use of transients (methods, etc). I have a few ideas that I'll follow up with here shortly.

@aaemnnosttv
Copy link
Collaborator Author

Create Module_With_Data_Available_State_Trait trait in Google\Site_Kit\Core\Modules namespace.

  • Add a protected $transients instance variable.
  • Add an abstract setup_transients method that returns Transients instance.
  • Add a protected get_transients method:

After giving this some more thought, I think the simplest solution is to add a Transients instance on the base Module class. They already have Options and User_Options so this fits into what a module has access to. This could be instantiated in the constructor given the module's Context instance.

The Site_Verification module already has a few uses of transients inline, so these could be refactored at the same time.

With this added, the trait can assume transients are available like any other property on Module. This should simplify things a fair bit I think.

  • Add a protected get_transient_slug method

This method name sounds generic but the usage here is specific to the transient for the data-available state. Let's name it more specifically to what it's for, perhaps get_data_available_transient_name? (internally it's referred to as a name rather than a slug)

  • Add a update_option_googlesitekit_{search-console/analytics/analytics-4}_settings action to compare the current and new search console property/ga3 property/ga4 measurement id and call reset_data_available method if they are different.

I'd like to see this implemented in a way that doesn't duplicate this across modules. This might even justify a prerequisite issue, but we do a similar thing in a few places. I'm thinking we could add a new on_update( callable ) instance method to the Setting class. This would be a wrapper for a few add_action calls, similar to Setting_With_Owned_Keys_Trait::register_owned_keys in that it would hook on update_option_{$option} and add_option_{$option}" (since WP treats these as separate). Then in the Module_With_Data_Available_State_Trait we could have a method to register_data_available_reset which could take the keys to "observe" in the listener callback as an argument. These could probably even be provided by $this->get_owned_keys() in this case, but we might want to be more specific about it and only pass the specific keys in the AC. Happy to discuss this part more or split out a separate issue for the infra here but let me know what you think.

@kuasha420 kuasha420 assigned aaemnnosttv and unassigned kuasha420 Feb 21, 2023
@aaemnnosttv aaemnnosttv assigned kuasha420 and unassigned aaemnnosttv Feb 24, 2023
@kuasha420 kuasha420 assigned aaemnnosttv and unassigned kuasha420 Feb 27, 2023
@aaemnnosttv aaemnnosttv removed their assignment Mar 1, 2023
@wpdarren wpdarren assigned wpdarren and mohitwp and unassigned wpdarren and mohitwp Mar 1, 2023
@wpdarren
Copy link
Collaborator

wpdarren commented Mar 7, 2023

QA Update: ⚠️

@kuasha420 I have two observations:

  1. I was expecting to see an entry for analytics_4 in the network table under data-available but all I see are search-console and analytics. See screenshot below.

image

  1. According to the QAB:

For sites that have data (including zero data):
Observe the network tab of developer tools after setting up and returning to dashboard. a POST request should be made to data-available endpoint once (for each module that is not in a gathering state).
From next page refresh, the selector should return true.

When I run googlesitekit.data.select('modules/analytics-4').isDataAvailableOnLoad() in the console, I see false.

image

Note: no feature flags are enabled. UA and GA4 properties are created for Analytics. I tried this for a site with zero data and also a site with data.

Please could you confirm if these two observations are expected?

@kuasha420
Copy link
Contributor

@wpdarren following up on our call, I have updated the QAB to add special instruction for GA4. Cheers!

@kuasha420 kuasha420 removed their assignment Mar 7, 2023
@wpdarren
Copy link
Collaborator

wpdarren commented Mar 8, 2023

QA Update: ✅

Verified:

  • The gathering and zero data state notification banners appear as expected and behave as usual.
  • Checked for Search Console, Analytics and Analytics 4 for the newly introduced selector isDataAvailableOnLoad using the commands in the QAB.
    • For any site: Running this selectors on first page load after setup returns false.
    • For Sites in gathering data: It returns false.
    • For sites that have data, a POST request is made to data-available endpoint for each module that is not in a gathering state. Checked to also make sure that it does not appear in gathering state.
    • From next page refresh for the modules, the selector returns true.
    • Note for GA4: Enabled the GA4Reporting feature flag. and the data-available endpoint appears for GA4 and the selector returns true.
  • Changing the property id/measurement id makes the selector return false first time, and behave accordingly depending on the gathering data state of new property. Deactivating and reactivating module also resets the value to false.
Screenshots

image
image
image
image
image

@aaemnnosttv
Copy link
Collaborator Author

⚠️ As discussed on the call earlier today, we want to make a small tweak to this one to avoid setting a persistent non-gathering state in the event of an error. I will start a quick follow-up PR to address this.

@aaemnnosttv
Copy link
Collaborator Author

aaemnnosttv commented Mar 9, 2023

The changes I mentioned above turned out to be a bit more involved than anticipated (mostly regarding changes to tests) but also there were some concerns about the changes to the report-based isGatheringData and isZeroData returning undefined after the underlying request was completed. This should be okay but isn't a change we want to introduce so soon before the release.

For that reason, we've amended the initial changes here to include an expiration for the persistent gathering state to avoid permanently setting it based on a non-successful request. We'll fix this to arrive on the original permanent persistent value in #6698.

Everything else is the same here and this change is essentially a one line change that can be quickly QA'd by an engineer.

@aaemnnosttv
Copy link
Collaborator Author

QA ✅

Confirmed that the transient is set to expire in 1 hour now (matching client side storage) rather than no expiration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Exp: SP P1 Medium priority PHP Type: Enhancement Improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

7 participants