Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(db_engine_specs): big query cost estimation #21325

Merged
merged 19 commits into from
Jan 9, 2023
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions superset-frontend/src/SqlLab/reducers/sqlLab.js
Original file line number Diff line number Diff line change
Expand Up @@ -321,7 +321,7 @@ export default function sqlLabReducer(state = {}, action) {
...state,
queryCostEstimates: {
...state.queryCostEstimates,
[action.query.sqlEditorId]: {
[action.query.id]: {
completed: false,
cost: null,
error: null,
Expand All @@ -334,7 +334,7 @@ export default function sqlLabReducer(state = {}, action) {
...state,
queryCostEstimates: {
...state.queryCostEstimates,
[action.query.sqlEditorId]: {
[action.query.id]: {
completed: true,
cost: action.json,
error: null,
Expand All @@ -347,7 +347,7 @@ export default function sqlLabReducer(state = {}, action) {
...state,
queryCostEstimates: {
...state.queryCostEstimates,
[action.query.sqlEditorId]: {
[action.query.id]: {
completed: false,
cost: null,
error: action.error,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ const ExtraOptions = ({
/>
<InfoTooltip
tooltip={t(
'For Presto and Postgres, shows a button to compute cost before running a query.',
'For Bigquery, Presto and Postgres, shows a button to compute cost before running a query.',
)}
/>
</div>
Expand Down
3 changes: 2 additions & 1 deletion superset/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -873,7 +873,8 @@ class CeleryConfig: # pylint: disable=too-few-public-methods
# query costs before they run. These EXPLAIN queries should have a small
# timeout.
SQLLAB_QUERY_COST_ESTIMATE_TIMEOUT = int(timedelta(seconds=10).total_seconds())
# The feature is off by default, and currently only supported in Presto and Postgres.
# The feature is off by default, and currently only supported in Presto and Postgres,
# and Bigquery.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current config, docs and the examples in config.py are in a pretty horrible state right now. I observed the following:

  1. The ESTIMATE_QUERY_COST config is in fact a feature flag and needs to be moved to DEFAULT_FEATURE_FLAGS:
    ESTIMATE_QUERY_COST = False
    This should be updated
  2. This example is broken:
    # "QUERY_COST_FORMATTERS_BY_ENGINE": {"postgresql": postgres_query_cost_formatter},
    . It should in fact be # QUERY_COST_FORMATTERS_BY_ENGINE = {"postgresql": postgres_query_cost_formatter}. This should also be updated.
  3. There are no docs for this feature. In the deprecated docs there's a SQL Lab section that seems to have been lost over the years, and the content is also incorrect: https://apache-superset.readthedocs.io/en/latest/sqllab.html I don't expect you to add docs for this, but just mentioning this here to call attention to it. FYI @rusackas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the callout... putting it on my very long to-do list!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the things and added some more comments.

About the docs I would love to fix the content. Only thing is that the superset/blob/master/docs/sqllab.rst file doesnt even exist now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zamar-roura let's leave the docs for a follow-up PR 👍

# It also need to be enabled on a per-database basis, by adding the key/value pair
# `cost_estimate_enabled: true` to the database `extra` attribute.
ESTIMATE_QUERY_COST = False
Expand Down
91 changes: 91 additions & 0 deletions superset/db_engine_specs/bigquery.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
from sqlalchemy.sql import sqltypes
from typing_extensions import TypedDict

from superset import sql_parse
from superset.constants import PASSWORD_MASK
from superset.databases.schemas import encrypted_field_properties, EncryptedString
from superset.databases.utils import make_url_safe
Expand Down Expand Up @@ -364,6 +365,96 @@ def df_to_sql(

pandas_gbq.to_gbq(df, **to_gbq_kwargs)

@classmethod
def estimate_query_cost(
cls,
database: "Database",
schema: str,
sql: str,
source: Optional[utils.QuerySource] = None,
) -> List[Dict[str, Any]]:
"""
Estimate the cost of a multiple statement SQL query.

:param database: Database instance
:param schema: Database schema
:param sql: SQL query with possibly multiple statements
:param source: Source of the query (eg, "sql_lab")
"""
extra = database.get_extra() or {}
if not cls.get_allow_cost_estimate(extra):
raise Exception("Database does not support cost estimation")

parsed_query = sql_parse.ParsedQuery(sql)
statements = parsed_query.get_statements()
costs = []
for statement in statements:
processed_statement = cls.process_statement(statement, database)

costs.append(cls.estimate_statement_cost(processed_statement, database))
return costs

@classmethod
def get_allow_cost_estimate(cls, extra: Dict[str, Any]) -> bool:
return True

@classmethod
def estimate_statement_cost(cls, statement: str, cursor: Any) -> Dict[str, Any]:
try:
# pylint: disable=import-outside-toplevel
# It's the only way to perfom a dry-run estimate cost
from google.cloud import bigquery
from google.oauth2 import service_account
except ImportError as ex:
raise Exception(
"Could not import libraries `pygibquery` or `google.oauth2`, which are "
"required to be installed in your environment in order "
"to upload data to BigQuery"
) from ex

with cls.get_engine(cursor) as engine:
creds = engine.dialect.credentials_info

creds = service_account.Credentials.from_service_account_info(creds)
client = bigquery.Client(credentials=creds)
job_config = bigquery.QueryJobConfig(dry_run=True)

query_job = client.query(
statement,
job_config=job_config,
) # Make an API request.

# Format Bytes.
byte_division = 1024
if hasattr(query_job, "total_bytes_processed"):
query_bytes_processed = query_job.total_bytes_processed
if query_bytes_processed // byte_division == 0:
byte_type = "B"
total_bytes_processed = query_bytes_processed
elif query_bytes_processed // (byte_division**2) == 0:
byte_type = "KB"
total_bytes_processed = round(query_bytes_processed / byte_division, 2)
elif query_bytes_processed // (byte_division**3) == 0:
byte_type = "MB"
total_bytes_processed = round(
query_bytes_processed / (byte_division**2), 2
)
else:
byte_type = "GB"
total_bytes_processed = round(
query_bytes_processed / (byte_division**3), 2
)

return {f"{byte_type} Processed": total_bytes_processed}
return {}

@classmethod
def query_cost_formatter(
cls, raw_cost: List[Dict[str, Any]]
) -> List[Dict[str, str]]:
print([{k: str(v) for k, v in row.items()} for row in raw_cost])
zamar-roura marked this conversation as resolved.
Show resolved Hide resolved
return [{k: str(v) for k, v in row.items()} for row in raw_cost]

@classmethod
def build_sqlalchemy_uri(
cls,
Expand Down
2 changes: 1 addition & 1 deletion superset/translations/de/LC_MESSAGES/messages.json
Original file line number Diff line number Diff line change
Expand Up @@ -1963,7 +1963,7 @@
"Font size for the smallest value in the list": [
"Schriftgröße für den kleinsten Wert in der Liste"
],
"For Presto and Postgres, shows a button to compute cost before running a query.": [
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a query.": [
"Für Presto und Postgres wird ein Buttons angezeigt, um Kosten vor dem Ausführen einer Abfrage zu schätzen."
],
"For regular filters, these are the roles this filter will be applied to. For base filters, these are the roles that the filter DOES NOT apply to, e.g. Admin if admin should see all data.": [
Expand Down
2 changes: 1 addition & 1 deletion superset/translations/de/LC_MESSAGES/messages.po
Original file line number Diff line number Diff line change
Expand Up @@ -6114,7 +6114,7 @@ msgstr "Schriftgröße für den kleinsten Wert in der Liste"

#: superset-frontend/src/views/CRUD/data/database/DatabaseModal/ExtraOptions.tsx:179
msgid ""
"For Presto and Postgres, shows a button to compute cost before running a "
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a "
"query."
msgstr ""
"Für Presto und Postgres wird ein Buttons angezeigt, um Kosten vor dem "
Expand Down
2 changes: 1 addition & 1 deletion superset/translations/en/LC_MESSAGES/messages.po
Original file line number Diff line number Diff line change
Expand Up @@ -5693,7 +5693,7 @@ msgstr ""

#: superset-frontend/src/views/CRUD/data/database/DatabaseModal/ExtraOptions.tsx:179
msgid ""
"For Presto and Postgres, shows a button to compute cost before running a "
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a "
"query."
msgstr ""

Expand Down
2 changes: 1 addition & 1 deletion superset/translations/es/LC_MESSAGES/messages.po
Original file line number Diff line number Diff line change
Expand Up @@ -5975,7 +5975,7 @@ msgstr ""
#: superset-frontend/src/views/CRUD/data/database/DatabaseModal/ExtraOptions.tsx:179
#, fuzzy
msgid ""
"For Presto and Postgres, shows a button to compute cost before running a "
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a "
"query."
msgstr "Estimar el costo antes de ejecutar una consulta"

Expand Down
2 changes: 1 addition & 1 deletion superset/translations/fr/LC_MESSAGES/messages.json
Original file line number Diff line number Diff line change
Expand Up @@ -3419,7 +3419,7 @@
"Enable query cost estimation": [
"Activer l'estimation du coût de la requête"
],
"For Presto and Postgres, shows a button to compute cost before running a query.": [
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a query.": [
"Pour Presto et Postgres, affiche un bouton pour calculer le coût avant d'exécuter une requête."
],
"Allow this database to be explored": [
Expand Down
2 changes: 1 addition & 1 deletion superset/translations/fr/LC_MESSAGES/messages.po
Original file line number Diff line number Diff line change
Expand Up @@ -6118,7 +6118,7 @@ msgstr ""

#: superset-frontend/src/views/CRUD/data/database/DatabaseModal/ExtraOptions.tsx:179
msgid ""
"For Presto and Postgres, shows a button to compute cost before running a "
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a "
"query."
msgstr ""
"Pour Presto et Postgres, affiche un bouton pour calculer le coût avant "
Expand Down
2 changes: 1 addition & 1 deletion superset/translations/it/LC_MESSAGES/messages.po
Original file line number Diff line number Diff line change
Expand Up @@ -5848,7 +5848,7 @@ msgstr ""

#: superset-frontend/src/views/CRUD/data/database/DatabaseModal/ExtraOptions.tsx:179
msgid ""
"For Presto and Postgres, shows a button to compute cost before running a "
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a "
"query."
msgstr ""

Expand Down
2 changes: 1 addition & 1 deletion superset/translations/ja/LC_MESSAGES/messages.po
Original file line number Diff line number Diff line change
Expand Up @@ -5833,7 +5833,7 @@ msgstr ""

#: superset-frontend/src/views/CRUD/data/database/DatabaseModal/ExtraOptions.tsx:179
msgid ""
"For Presto and Postgres, shows a button to compute cost before running a "
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a "
"query."
msgstr ""

Expand Down
2 changes: 1 addition & 1 deletion superset/translations/ko/LC_MESSAGES/messages.po
Original file line number Diff line number Diff line change
Expand Up @@ -5798,7 +5798,7 @@ msgstr ""

#: superset-frontend/src/views/CRUD/data/database/DatabaseModal/ExtraOptions.tsx:179
msgid ""
"For Presto and Postgres, shows a button to compute cost before running a "
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a "
"query."
msgstr ""

Expand Down
2 changes: 1 addition & 1 deletion superset/translations/messages.pot
Original file line number Diff line number Diff line change
Expand Up @@ -5698,7 +5698,7 @@ msgstr ""

#: superset-frontend/src/views/CRUD/data/database/DatabaseModal/ExtraOptions.tsx:179
msgid ""
"For Presto and Postgres, shows a button to compute cost before running a "
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a "
"query."
msgstr ""

Expand Down
2 changes: 1 addition & 1 deletion superset/translations/nl/LC_MESSAGES/messages.json
Original file line number Diff line number Diff line change
Expand Up @@ -4487,7 +4487,7 @@
"Sta manipulatie van de database toe met niet-SELECT statements zoals UPDATE, DELETE, CREATE, enz."
],
"Enable query cost estimation": [""],
"For Presto and Postgres, shows a button to compute cost before running a query.": [
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a query.": [
""
],
"Allow this database to be explored": [""],
Expand Down
2 changes: 1 addition & 1 deletion superset/translations/nl/LC_MESSAGES/messages.po
Original file line number Diff line number Diff line change
Expand Up @@ -15015,7 +15015,7 @@ msgstr ""

#: superset-frontend/src/views/CRUD/data/database/DatabaseModal/ExtraOptions.tsx:179
msgid ""
"For Presto and Postgres, shows a button to compute cost before running a "
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a "
"query."
msgstr ""

Expand Down
2 changes: 1 addition & 1 deletion superset/translations/pt_BR/LC_MESSAGES/messages.po
Original file line number Diff line number Diff line change
Expand Up @@ -6115,7 +6115,7 @@ msgstr ""
#: superset-frontend/src/views/CRUD/data/database/DatabaseModal/ExtraOptions.tsx:179
#, fuzzy
msgid ""
"For Presto and Postgres, shows a button to compute cost before running a "
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a "
"query."
msgstr "Estima o custo antes de executar uma consulta"

Expand Down
2 changes: 1 addition & 1 deletion superset/translations/ru/LC_MESSAGES/messages.po
Original file line number Diff line number Diff line change
Expand Up @@ -6040,7 +6040,7 @@ msgstr ""
#: superset-frontend/src/views/CRUD/data/database/DatabaseModal/ExtraOptions.tsx:179
#, fuzzy
msgid ""
"For Presto and Postgres, shows a button to compute cost before running a "
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a "
"query."
msgstr "Спрогнозировать время до выполнения запроса"

Expand Down
2 changes: 1 addition & 1 deletion superset/translations/sk/LC_MESSAGES/messages.po
Original file line number Diff line number Diff line change
Expand Up @@ -5710,7 +5710,7 @@ msgstr ""

#: superset-frontend/src/views/CRUD/data/database/DatabaseModal/ExtraOptions.tsx:179
msgid ""
"For Presto and Postgres, shows a button to compute cost before running a "
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a "
"query."
msgstr ""

Expand Down
15 changes: 3 additions & 12 deletions superset/translations/sl/LC_MESSAGES/messages.json
Original file line number Diff line number Diff line change
Expand Up @@ -3453,17 +3453,8 @@
"Are you sure you want to overwrite this dataset?": [
"Ali ste prepričani, da želite prepisati podatkovni set?"
],
"Undefined": ["Ni definirano"],
"Save": ["Shrani"],
"Save as": ["Shrani kot"],
"Save query": ["Shrani poizvedbo"],
"Update": ["Posodobi"],
"Label for your query": ["Ime vaše poizvedbe"],
"Write a description for your query": ["Dodajte opis vaše poizvedbe"],
"Schedule query": ["Urnik poizvedb"],
"Schedule": ["Urnik"],
"There was an error with your request": [
"Pri zahtevi je prišlo do napake"
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a query.": [
"Za Presto in Postgres prikaže gumb za izračun potratnosti pred zagonom poizvedbe."
],
"Please save the query to enable sharing": [
"Shranite poizvedbo za deljenje"
Expand Down Expand Up @@ -5138,7 +5129,7 @@
"Enable query cost estimation": [
"Omogoči ocenjevanje potratnosti poizvedbe"
],
"For Presto and Postgres, shows a button to compute cost before running a query.": [
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a query.": [
"Za Presto in Postgres prikaže gumb za izračun potratnosti pred zagonom poizvedbe."
],
"Allow this database to be explored": [
Expand Down
7 changes: 3 additions & 4 deletions superset/translations/sl/LC_MESSAGES/messages.po
Original file line number Diff line number Diff line change
Expand Up @@ -6696,9 +6696,8 @@ msgstr "Ciljno razmerje za razdelke drevesnega grafikona."

#: superset-frontend/plugins/legacy-plugin-chart-treemap/src/index.js:31
msgid ""
"Shows the composition of a dataset by segmenting a given rectangle as smaller "
"rectangles with areas proportional to their value or contribution to the whole. "
"Those rectangles may also, in turn, be further segmented hierarchically."
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a "
"query."
msgstr ""
"Prikaže zgradbo podatkovnega seta na podlagi segmentacije danega pravokotnika na "
"manjše pravokotnike, pri čemer je ploščina sorazmerna vrednostim oz. deležem. "
Expand Down Expand Up @@ -15788,7 +15787,7 @@ msgstr "Omogoči ocenjevanje potratnosti poizvedbe"

#: superset-frontend/src/views/CRUD/data/database/DatabaseModal/ExtraOptions.tsx:179
msgid ""
"For Presto and Postgres, shows a button to compute cost before running a query."
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a query."
msgstr ""
"Za Presto in Postgres prikaže gumb za izračun potratnosti pred zagonom poizvedbe."

Expand Down
2 changes: 1 addition & 1 deletion superset/translations/zh/LC_MESSAGES/messages.po
Original file line number Diff line number Diff line change
Expand Up @@ -5924,7 +5924,7 @@ msgstr "列表中最小值的字体大小"
#: superset-frontend/src/views/CRUD/data/database/DatabaseModal/ExtraOptions.tsx:179
#, fuzzy
msgid ""
"For Presto and Postgres, shows a button to compute cost before running a "
"For Bigquery, Presto and Postgres, shows a button to compute cost before running a "
"query."
msgstr "在运行查询之前计算执行计划"

Expand Down