Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Google Search Console: add custom analytics stream #16433

Merged
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ RUN pip install .
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.1.13
LABEL io.airbyte.version=0.1.14
LABEL io.airbyte.name=airbyte/source-google-search-console
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,19 @@ tests:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/configured_catalog.json"
empty_streams: []
timeout_seconds: 1800
full_refresh:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/catalog.json"
timeout_seconds: 1800
incremental:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/configured_catalog_incremental.json"
timeout_seconds: 1800
future_state_path: "integration_tests/abnormal_state.json"
cursor_paths:
search_analytics_by_country: [ "https://airbyte.io", "web", "date" ]
search_analytics_by_country: [ "https://airbyte.io", "web", "image" ]
search_analytics_by_device: [ "https://airbyte.io", "web", "date" ]
search_analytics_by_page: [ "https://airbyte.io", "web", "date" ]
search_analytics_by_query: [ "https://airbyte.io", "web", "date" ]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@
#

import json
from typing import Any, List, Mapping, Tuple
from typing import Any, List, Mapping, Tuple, Optional

import pendulum
from jsonschema import validate

from airbyte_cdk.logger import AirbyteLogger
from airbyte_cdk.models import SyncMode
from airbyte_cdk.sources import AbstractSource
Expand All @@ -21,9 +23,35 @@
SearchAnalyticsByQuery,
Sitemaps,
Sites,
SearchAnalyticsByCustomDimensions,
)


custom_reports_schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"minLength": 1
},
"dimensions": {
"type": "array",
"items": {
"type": "string",
"minLength": 1
}
}
},
"required": [
"name",
"dimensions"
]
}
}


class SourceGoogleSearchConsole(AbstractSource):
def check_connection(self, logger: AirbyteLogger, config: Mapping[str, Any]) -> Tuple[bool, Any]:
try:
Expand Down Expand Up @@ -62,8 +90,22 @@ def streams(self, config: Mapping[str, Any]) -> List[Stream]:
SearchAnalyticsAllFields(**stream_config),
]

streams = streams + self.get_custom_reports(config=config, stream_config=stream_config)

return streams

def get_custom_reports(self, config: Mapping[str, Any], stream_config: Mapping[str, Any]) -> List[Optional[Stream]]:
if "custom_reports" not in config:
return []

reports = json.loads(config["custom_reports"])
roman-yermilov-gl marked this conversation as resolved.
Show resolved Hide resolved
validate(reports, custom_reports_schema)

return [
type(report["name"], (SearchAnalyticsByCustomDimensions,), {})(dimensions=report["dimensions"], **stream_config)
for report in reports
]

@staticmethod
def get_stream_kwargs(config: Mapping[str, Any]) -> Mapping[str, Any]:
authorization = config.get("authorization", {})
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,12 @@
}
}
]
},
"custom_reports": {
roman-yermilov-gl marked this conversation as resolved.
Show resolved Hide resolved
"order": 4,
"type": "string",
"title": "Custom Reports (Optional)",
"description": "A JSON array describing the custom reports you want to sync from Google Search Console. See <a href=\"https://docs.airbyte.com/integrations/sources/google-search-console#step-2-set-up-the-google-search-console-connector-in-airbyte\">the docs</a> for more information about the exact format you can use to fill out this field."
roman-yermilov-gl marked this conversation as resolved.
Show resolved Hide resolved
}
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -295,3 +295,65 @@ class SearchAnalyticsByQuery(SearchAnalytics):

class SearchAnalyticsAllFields(SearchAnalytics):
dimensions = ["date", "country", "device", "page", "query"]


class SearchAnalyticsByCustomDimensions(SearchAnalytics):
def __init__(self, dimensions: List[str], *args, **kwargs):
super(SearchAnalyticsByCustomDimensions, self).__init__(*args, **kwargs)
self.dimensions = dimensions

def get_json_schema(self) -> Mapping[str, Any]:
try:
return super(SearchAnalyticsByCustomDimensions, self).get_json_schema()
except IOError:
schema: Mapping[str, Any] = {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": ["null", "object"],
"additionalProperties": True,
"properties": {
"clicks": {
"type": ["null", "integer"]
},
"ctr": {
"type": ["null", "number"],
"multipleOf": 1e-25
},
"date": {
"type": ["null", "string"],
"format": "date"
},
"impressions": {
"type": ["null", "integer"]
},
"position": {
"type": ["null", "number"],
"multipleOf": 1e-25
},
"search_type": {
"type": ["null", "string"]
},
"site_url": {
"type": ["null", "string"]
},
}
}

dimension_properties = self.dimension_to_property_schema()
schema["properties"].update(dimension_properties)

return schema

def dimension_to_property_schema(self) -> dict:
dimension_to_property_schema_map = {
'country': [{"country": {"type": ["null", "string"]}}],
'date': [],
'device': [{"device": {"type": ["null", "string"]}}],
'page': [{"page": {"type": ["null", "string"]}}],
'query': [{"query": {"type": ["null", "string"]}}]
}
properties = {}
for dimension in sorted(self.dimensions):
fields = dimension_to_property_schema_map[dimension]
for field in fields:
properties = {**properties, **field}
return properties
36 changes: 20 additions & 16 deletions docs/integrations/sources/google-search-console.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,14 +66,16 @@ At the end of this process, you should have JSON credentials to this Google Serv
4. Click Authenticate your account to sign in with Google and authorize your account.
5. Fill in the `site_urls` field.
5. Fill in the `start date` field.
6. You should be ready to sync data.
6. Fill in the `custom reports` (optionally) in format `{"name": "<report-name>", "dimensions": ["<dimension-name>", ...]}`
7. You should be ready to sync data.

### For Airbyte Open Source:

1. Fill in the `service_account_info` and `email` fields for authentication.
2. Fill in the `site_urls` field.
3. Fill in the `start date` field.
4. You should be ready to sync data.
4. Fill in the `custom reports` (optionally) in format `{"name": "<report-name>", "dimensions": ["<dimension-name>", ...]}`
5. You should be ready to sync data.


## Supported sync modes
Expand All @@ -98,6 +100,7 @@ The google search console source connector supports the following [sync modes](h
* [Analytics report by device](https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query)
* [Analytics report by page](https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query)
* [Analytics report by query](https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query)
* Analytics report by custom dimensions


## Performance considerations
Expand All @@ -117,18 +120,19 @@ This connector attempts to back off gracefully when it hits Reports API's rate l

## Changelog

| Version | Date | Pull Request | Subject |
|:---------| :--- | :--- | :--- |
| `0.1.13` | 2022-07-21 | [14924](https://github.com/airbytehq/airbyte/pull/14924) | Remove `additionalProperties` field from specs |
| `0.1.12` | 2022-05-04 | [12482](https://github.com/airbytehq/airbyte/pull/12482) | Update input configuration copy |
| `0.1.11` | 2022-01-05 | [9186](https://github.com/airbytehq/airbyte/pull/9186) [9194](https://github.com/airbytehq/airbyte/pull/9194) | Fix incremental sync: keep all urls in state object |
| `0.1.10` | 2021-12-23 | [9073](https://github.com/airbytehq/airbyte/pull/9073) | Add slicing by date range |
| `0.1.9` | 2021-12-22 | [9047](https://github.com/airbytehq/airbyte/pull/9047) | Add 'order' to spec.json props |
| `0.1.8` | 2021-12-21 | [8248](https://github.com/airbytehq/airbyte/pull/8248) | Enable Sentry for performance and errors tracking |
| `0.1.7` | 2021-11-26 | [7431](https://github.com/airbytehq/airbyte/pull/7431) | Add default `end_date` param value |
| `0.1.6` | 2021-09-27 | [6460](https://github.com/airbytehq/airbyte/pull/6460) | Update OAuth Spec File |
| `0.1.4` | 2021-09-23 | [6394](https://github.com/airbytehq/airbyte/pull/6394) | Update Doc link Spec File |
| `0.1.3` | 2021-09-23 | [6405](https://github.com/airbytehq/airbyte/pull/6405) | Correct Spec File |
| `0.1.2` | 2021-09-17 | [6222](https://github.com/airbytehq/airbyte/pull/6222) | Correct Spec File |
| Version | Date | Pull Request | Subject |
|:---------|:-----------| :--- |:------------------------------------------------------------|
| `0.1.14` | 2022-09-08 | [16433](https://github.com/airbytehq/airbyte/pull/16433) | Add custom analytics stream. |
| `0.1.13` | 2022-07-21 | [14924](https://github.com/airbytehq/airbyte/pull/14924) | Remove `additionalProperties` field from specs |
| `0.1.12` | 2022-05-04 | [12482](https://github.com/airbytehq/airbyte/pull/12482) | Update input configuration copy |
| `0.1.11` | 2022-01-05 | [9186](https://github.com/airbytehq/airbyte/pull/9186) [9194](https://github.com/airbytehq/airbyte/pull/9194) | Fix incremental sync: keep all urls in state object |
| `0.1.10` | 2021-12-23 | [9073](https://github.com/airbytehq/airbyte/pull/9073) | Add slicing by date range |
| `0.1.9` | 2021-12-22 | [9047](https://github.com/airbytehq/airbyte/pull/9047) | Add 'order' to spec.json props |
| `0.1.8` | 2021-12-21 | [8248](https://github.com/airbytehq/airbyte/pull/8248) | Enable Sentry for performance and errors tracking |
| `0.1.7` | 2021-11-26 | [7431](https://github.com/airbytehq/airbyte/pull/7431) | Add default `end_date` param value |
| `0.1.6` | 2021-09-27 | [6460](https://github.com/airbytehq/airbyte/pull/6460) | Update OAuth Spec File |
| `0.1.4` | 2021-09-23 | [6394](https://github.com/airbytehq/airbyte/pull/6394) | Update Doc link Spec File |
| `0.1.3` | 2021-09-23 | [6405](https://github.com/airbytehq/airbyte/pull/6405) | Correct Spec File |
| `0.1.2` | 2021-09-17 | [6222](https://github.com/airbytehq/airbyte/pull/6222) | Correct Spec File |
| `0.1.1` | 2021-09-22 | [6315](https://github.com/airbytehq/airbyte/pull/6315) | Verify access to all sites when performing connection check |
| `0.1.0` | 2021-09-03 | [5350](https://github.com/airbytehq/airbyte/pull/5350) | Initial Release |
| `0.1.0` | 2021-09-03 | [5350](https://github.com/airbytehq/airbyte/pull/5350) | Initial Release |