-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source Google Ads: Normalization produces improper data since introduction of "segments.hour" in 0.2.3 #20283
Comments
Having the exact same issue here! A quick fix I did is working with |
Yeah that's unfortunate, issue seems quite problematic since it results in incorrect data and forces to downgrade AFAIU. I'm wondering how users using the GADS source haven't hit this issue or realised they're affected. Seems quite a priority for airbyte maintainers to fix it. Tried to help as much as I can with clear explanations and solutions hints but despite that there doesn't seem to be any traction 🤷♂️ |
Downgrading doesn't work anymore as the google ads python lib used by the connector seems now unsupported by the google ads api. Had to apply this patch on ===================================================================
diff --git a/models/_sync/airbyte_incremental/scd/_sync_google_ads/campaigns_google_ads_scd.sql b/models/_sync/airbyte_incremental/scd/_sync_google_ads/campaigns_google_ads_scd.sql
--- a/models/_sync/airbyte_incremental/scd/_sync_google_ads/campaigns_google_ads_scd.sql (revision 474c9895c148a79fc17adb8623e016ea20a541e1)
+++ b/models/_sync/airbyte_incremental/scd/_sync_google_ads/campaigns_google_ads_scd.sql (date 1678738942665)
@@ -105,6 +105,7 @@
{{ dbt_utils.surrogate_key([
adapter.quote('campaign.id'),
adapter.quote('segments.date'),
+ adapter.quote('segments.hour'),
]) }} as _airbyte_unique_key,
{{ adapter.quote('campaign.id') }},
{{ adapter.quote('metrics.ctr') }},
@@ -197,10 +198,12 @@
_airbyte_emitted_at desc
) as _airbyte_end_at,
case when row_number() over (
- partition by {{ adapter.quote('campaign.id') }}, {{ adapter.quote('segments.date') }}
+ partition by {{ adapter.quote('campaign.id') }}, {{ adapter.quote('segments.date') }}, {{ adapter.quote('segments.hour') }}
order by
{{ adapter.quote('segments.date') }} is null asc,
{{ adapter.quote('segments.date') }} desc,
+ {{ adapter.quote('segments.hour') }} is null asc,
+ {{ adapter.quote('segments.hour') }} desc,
_airbyte_emitted_at desc
) = 1 then 1 else 0 end as _airbyte_active_row,
_airbyte_ab_id,
|
Hi!
Thanks a lot for working on this, I will do the updates soon!
Have a great day,
Baudouin
…On Tue, Mar 14, 2023 at 12:00 PM Hans Lemm ***@***.***> wrote:
Hi!
Yesterday, this issue was fixed with PR #23999
<#23999>.
Update your connector *and refresh the source schema* to update the
incremental sync.
—
Reply to this email directly, view it on GitHub
<#20283 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AXKYOM4J3QAZNZLTTNBZISTW4BFWJANCNFSM6AAAAAASZBFLQI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
This issue was fixed in yesterday's PR for Source Google Ads #23999. Update the connector & refresh source schema to make use of the fixed incremental sync. |
@hanslemm @evantahler tried it but didn't fix anything. Attached Logs
|
It is important to refresh the source schema inside Airbyte UI after updating the connector. Otherwise, the SCD logic will remain the old one. Could you confirm that this step was done? |
@hanslemm Could you copy / paste here your generated dbt scd model please ? I don't see any change since your connector update and I'm convinced this is the file that needs the changes mentioned above AFAIU airbyte generates a dbt project for each sync so there's no chance that the fact it didn't change could be related to a leftover of previous syncs when the connector was not in the latest version, right ? |
@hanslemm huh 🤦♂️ I misread and confused refreshing the schema and resetting the data Now it makes sense and the generated scd model contains the appropriate changes. Thanks, I wish there would be a notification or automatic process that says "Hey this connector upgrade requires a schema refresh, do it (or we did it automatically for you)" |
Environment
Current Behavior
While normalizing google ads extracted data with the
Incremental | Deduped + history
strategy, since the introduction of thesegments.hour
in0.2.3
I believe, the row normalized in the final tablecampaign
is now a single hour data of the day.Meaning
Gives
But
Gives
Expected Behavior
I would expect it to keep the previous behavior by aggregating data by the date, meaning having this result basically in
campaigns
But I understand since the introduction of
segments.hour
it adds more granularity to the data so aggregating here would mean losing this new granularity, defeating the purpose of introducing it.In this case I would expect the final
campaigns
table to properly show all the the rows for each hours.Hints
I believe problem is in the generated
campaign_scd.sql
where it defines uniqueness by partitioning bysegments.date
only, missing the newsegments.hour
granularity introduced.The text was updated successfully, but these errors were encountered: