Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sql lab): display presto and trino tracking url #20799

Merged
merged 3 commits into from
Jul 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions UPDATING.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ assists people when migrating to a new version.
## Next

- [20606](https://github.com/apache/superset/pull/20606): When user clicks on chart title or "Edit chart" button in Dashboard page, Explore opens in the same tab. Clicking while holding cmd/ctrl opens Explore in a new tab. To bring back the old behaviour (always opening Explore in a new tab), flip feature flag `DASHBOARD_EDIT_CHART_IN_NEW_TAB` to `True`.
- [20799](https://github.com/apache/superset/pull/20799): Presto and Trino engine will now display tracking URL for running queries in SQL Lab. If for some reason you don't want to show the tracking URL (for example, when your data warehouse hasn't enable access for to Presto or Trino UI), update `TRACKING_URL_TRANSFORMER` in `config.py` to return `None`.

### Breaking Changes

Expand Down
14 changes: 14 additions & 0 deletions docs/docs/contributing/testing-locally.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,20 @@ You can run unit tests found in './tests/unit_tests' for example with pytest. It
pytest ./link_to_test.py
```

#### Testing with local Presto connections

If you happen to change db engine spec for Presto/Trino, you can run a local Presto cluster with Docker:

```bash
docker run -p 15433:15433 starburstdata/presto:350-e.6
```

Then update `SUPERSET__SQLALCHEMY_EXAMPLES_URI` to point to local Presto cluster:

```bash
export SUPERSET__SQLALCHEMY_EXAMPLES_URI=presto://localhost:15433/memory/default
```

### Frontend Testing

We use [Jest](https://jestjs.io/) and [Enzyme](https://airbnb.io/enzyme/) to test TypeScript/JavaScript. Tests can be run with:
Expand Down
28 changes: 17 additions & 11 deletions superset-frontend/src/SqlLab/components/ResultSet/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,9 @@ const ResultSetButtons = styled.div`

const ResultSetErrorMessage = styled.div`
padding-top: ${({ theme }) => 4 * theme.gridUnit}px;
.sql-result-track-job {
margin-top: ${({ theme }) => 2 * theme.gridUnit}px;
}
`;

export default class ResultSet extends React.PureComponent<
Expand Down Expand Up @@ -417,6 +420,19 @@ export default class ResultSet extends React.PureComponent<
if (this.props.database && this.props.database.explore_database_id) {
exploreDBId = this.props.database.explore_database_id;
}
let trackingUrl;
if (query.trackingUrl) {
trackingUrl = (
<Button
className="sql-result-track-job"
buttonSize="small"
href={query.trackingUrl}
target="_blank"
>
{query.state === 'running' ? t('Track job') : t('See query details')}
</Button>
);
}

if (this.props.showSql) sql = <HighlightedSql sql={query.sql} />;

Expand All @@ -434,6 +450,7 @@ export default class ResultSet extends React.PureComponent<
link={query.link}
source="sqllab"
/>
{trackingUrl}
</ResultSetErrorMessage>
);
}
Expand Down Expand Up @@ -550,7 +567,6 @@ export default class ResultSet extends React.PureComponent<
);
}
}
let trackingUrl;
let progressBar;
if (query.progress > 0) {
progressBar = (
Expand All @@ -560,16 +576,6 @@ export default class ResultSet extends React.PureComponent<
/>
);
}
if (query.trackingUrl) {
trackingUrl = (
<Button
buttonSize="small"
onClick={() => query.trackingUrl && window.open(query.trackingUrl)}
>
{t('Track job')}
</Button>
);
}
const progressMsg =
query && query.extra && query.extra.progress
? query.extra.progress
Expand Down
52 changes: 12 additions & 40 deletions superset-frontend/src/components/Button/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,15 @@
* specific language governing permissions and limitations
* under the License.
*/
import React, { CSSProperties, Children, ReactElement } from 'react';
import React, { Children, ReactElement } from 'react';
import { kebabCase } from 'lodash';
import { mix } from 'polished';
import cx from 'classnames';
import { AntdButton } from 'src/components';
import { useTheme } from '@superset-ui/core';
import { Tooltip } from 'src/components/Tooltip';
import { ButtonProps as AntdButtonProps } from 'antd/lib/button';
import { TooltipProps } from 'antd/lib/tooltip';

export type OnClickHandler = React.MouseEventHandler<HTMLElement>;

Expand All @@ -37,45 +39,15 @@ export type ButtonStyle =
| 'link'
| 'dashed';

export interface ButtonProps {
id?: string;
className?: string;
tooltip?: string;
ghost?: boolean;
placement?:
| 'bottom'
| 'left'
| 'right'
| 'top'
| 'topLeft'
| 'topRight'
| 'bottomLeft'
| 'bottomRight'
| 'leftTop'
| 'leftBottom'
| 'rightTop'
| 'rightBottom';
onClick?: OnClickHandler;
onMouseDown?: OnClickHandler;
disabled?: boolean;
buttonStyle?: ButtonStyle;
buttonSize?: 'default' | 'small' | 'xsmall';
style?: CSSProperties;
children?: React.ReactNode;
href?: string;
htmlType?: 'button' | 'submit' | 'reset';
cta?: boolean;
loading?: boolean | { delay?: number | undefined } | undefined;
showMarginRight?: boolean;
type?:
| 'default'
| 'text'
| 'link'
| 'primary'
| 'dashed'
| 'ghost'
| undefined;
}
export type ButtonProps = Omit<AntdButtonProps, 'css'> &
Pick<TooltipProps, 'placement'> & {
tooltip?: string;
className?: string;
buttonSize?: 'default' | 'small' | 'xsmall';
buttonStyle?: ButtonStyle;
cta?: boolean;
showMarginRight?: boolean;
};

export default function Button(props: ButtonProps) {
const {
Expand Down
8 changes: 7 additions & 1 deletion superset/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -995,7 +995,13 @@ def CSV_TO_HIVE_UPLOAD_DIRECTORY_FUNC( # pylint: disable=invalid-name
# into a proxied one


TRACKING_URL_TRANSFORMER = lambda x: x
# Transform SQL query tracking url for Hive and Presto engines. You may also
# access information about the query itself by adding a second parameter
# to your transformer function, e.g.:
# TRACKING_URL_TRANSFORMER = (
# lambda url, query: url if is_fresh(query) else None
# )
TRACKING_URL_TRANSFORMER = lambda url: url
ktmud marked this conversation as resolved.
Show resolved Hide resolved


# Interval between consecutive polls when using Hive Engine
Expand Down
11 changes: 2 additions & 9 deletions superset/db_engine_specs/hive.py
Original file line number Diff line number Diff line change
Expand Up @@ -315,7 +315,7 @@ def progress(cls, log_lines: List[str]) -> int:
return int(progress)

@classmethod
def get_tracking_url(cls, log_lines: List[str]) -> Optional[str]:
def get_tracking_url_from_logs(cls, log_lines: List[str]) -> Optional[str]:
lkp = "Tracking URL = "
for line in log_lines:
if lkp in line:
Expand Down Expand Up @@ -366,21 +366,14 @@ def handle_cursor( # pylint: disable=too-many-locals
query.progress = progress
needs_commit = True
if not tracking_url:
tracking_url = cls.get_tracking_url(log_lines)
tracking_url = cls.get_tracking_url_from_logs(log_lines)
if tracking_url:
job_id = tracking_url.split("/")[-2]
logger.info(
"Query %s: Found the tracking url: %s",
str(query_id),
tracking_url,
)
transformer = current_app.config["TRACKING_URL_TRANSFORMER"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud per you PR description this logic has changed. Can we confirmed that the previous behavior is preserved? I just want to confirm this isn't a breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracking_url = transformer(tracking_url)
logger.info(
"Query %s: Transformation applied: %s",
str(query_id),
tracking_url,
)
query.tracking_url = tracking_url
logger.info("Query %s: Job id: %s", str(query_id), str(job_id))
needs_commit = True
Expand Down
23 changes: 22 additions & 1 deletion superset/db_engine_specs/presto.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,12 @@
# prevent circular imports
from superset.models.core import Database

# need try/catch because pyhive may not be installed
try:
from pyhive.presto import Cursor # pylint: disable=unused-import
except ImportError:
pass

COLUMN_DOES_NOT_EXIST_REGEX = re.compile(
"line (?P<location>.+?): .*Column '(?P<column_name>.+?)' cannot be resolved"
)
Expand Down Expand Up @@ -957,8 +963,23 @@ def get_create_view(
return rows[0][0]

@classmethod
def handle_cursor(cls, cursor: Any, query: Query, session: Session) -> None:
def get_tracking_url(cls, cursor: "Cursor") -> Optional[str]:
try:
if cursor.last_query_id:
# pylint: disable=protected-access, line-too-long
return f"{cursor._protocol}://{cursor._host}:{cursor._port}/ui/query.html?{cursor.last_query_id}"
except AttributeError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pass
return None

Copy link
Member Author

@ktmud ktmud Jul 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final return is expected to be explicit if you have a return statement earlier. I think there is a pylint or mypy warning for this.

return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return None


@classmethod
def handle_cursor(cls, cursor: "Cursor", query: Query, session: Session) -> None:
"""Updates progress information"""
tracking_url = cls.get_tracking_url(cursor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm itching for us to drop Python 3.8 so we can use the walrus operator.

if tracking_url:
query.tracking_url = tracking_url
ktmud marked this conversation as resolved.
Show resolved Hide resolved
session.commit()

query_id = query.id
poll_interval = query.database.connect_args.get(
"poll_interval", current_app.config["PRESTO_POLL_INTERVAL"]
Expand Down
24 changes: 23 additions & 1 deletion superset/db_engine_specs/trino.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,11 @@
if TYPE_CHECKING:
from superset.models.core import Database

try:
from trino.dbapi import Cursor # pylint: disable=unused-import
except ImportError:
pass

logger = logging.getLogger(__name__)


Expand Down Expand Up @@ -109,8 +114,25 @@ def get_view_names(
)

@classmethod
def handle_cursor(cls, cursor: Any, query: Query, session: Session) -> None:
def get_tracking_url(cls, cursor: "Cursor") -> Optional[str]:
try:
return cursor.info_uri
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info_uri is only available in the latest version of the Trino client, which we haven't upgrade to yet.

except AttributeError:
try:
conn = cursor.connection
# pylint: disable=protected-access, line-too-long
return f"{conn.http_scheme}://{conn.host}:{conn.port}/ui/query.html?{cursor._query.query_id}"
except AttributeError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment.

return None

@classmethod
def handle_cursor(cls, cursor: "Cursor", query: Query, session: Session) -> None:
"""Updates progress information"""
tracking_url = cls.get_tracking_url(cursor)
if tracking_url:
query.tracking_url = tracking_url
session.commit()
BaseEngineSpec.handle_cursor(cursor=cursor, query=query, session=session)

@staticmethod
Expand Down
30 changes: 28 additions & 2 deletions superset/models/sql_lab.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,15 @@
# specific language governing permissions and limitations
# under the License.
"""A collection of ORM sqlalchemy models for SQL Lab"""
import inspect
import logging
import re
from datetime import datetime
from typing import Any, Dict, List, Optional, Type, TYPE_CHECKING

import simplejson as json
import sqlalchemy as sqla
from flask import Markup
from flask import current_app, Markup
from flask_appbuilder import Model
from flask_appbuilder.models.decorators import renders
from humanize import naturaltime
Expand Down Expand Up @@ -56,6 +58,9 @@
from superset.db_engine_specs import BaseEngineSpec


logger = logging.getLogger(__name__)


class Query(Model, ExtraJSONMixin, ExploreMixin): # pylint: disable=abstract-method
"""ORM model for SQL query

Expand Down Expand Up @@ -104,7 +109,7 @@ class Query(Model, ExtraJSONMixin, ExploreMixin): # pylint: disable=abstract-me
start_running_time = Column(Numeric(precision=20, scale=6))
end_time = Column(Numeric(precision=20, scale=6))
end_result_backend_time = Column(Numeric(precision=20, scale=6))
tracking_url = Column(Text)
tracking_url_raw = Column(Text, name="tracking_url")

changed_on = Column(
DateTime, default=datetime.utcnow, onupdate=datetime.utcnow, nullable=True
Expand Down Expand Up @@ -279,6 +284,27 @@ def default_endpoint(self) -> str:
def get_extra_cache_keys(query_obj: Dict[str, Any]) -> List[str]:
return []

@property
def tracking_url(self) -> Optional[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be a method called get_tracking_url and then you wouldn't need to rename the column virtually or have a getter/setter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want the FAB-based API to always return the transformed url but I couldn't find a clean way to do it without adding an (unnecessary) new field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpgaspar @betodealmeida do you have any recommendation on what's the best practice for such use case?

"""
Transfrom tracking url at run time because the exact URL may depends
on query properties such as execution and finish time.
"""
transform = current_app.config.get("TRACKING_URL_TRANSFORMER")
url = self.tracking_url_raw
if url and transform:
sig = inspect.signature(transform)
# for backward compatibility, users may define a transformer function
# with only one parameter (`url`).
args = [url, self][: len(sig.parameters)]
url = transform(*args)
logger.debug("Transformed tracking url: %s", url)
return url

@tracking_url.setter
def tracking_url(self, value: str) -> None:
self.tracking_url_raw = value


class SavedQuery(Model, AuditMixinNullable, ExtraJSONMixin, ImportExportMixin):
"""ORM model for SQL query"""
Expand Down
9 changes: 8 additions & 1 deletion superset/sql_lab.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,13 @@ def handle_query_error(
msg = f"{prefix_message} {str(ex)}".strip()
troubleshooting_link = config["TROUBLESHOOTING_LINK"]
query.error_message = msg
query.status = QueryStatus.FAILED
query.tmp_table_name = None
query.status = QueryStatus.FAILED
# TODO: re-enable this after updating the frontend to properly display timeout status
# if query.status != QueryStatus.TIMED_OUT:
# query.status = QueryStatus.FAILED
ktmud marked this conversation as resolved.
Show resolved Hide resolved
if not query.end_time:
query.end_time = now_as_float()

# extract DB-specific errors (invalid column, eg)
if isinstance(ex, SupersetErrorException):
Expand Down Expand Up @@ -286,6 +291,8 @@ def execute_sql_statement( # pylint: disable=too-many-arguments,too-many-statem
# return 1 row less than increased_query
data = data[:-1]
except SoftTimeLimitExceeded as ex:
query.status = QueryStatus.TIMED_OUT

logger.warning("Query %d: Time limit exceeded", query.id)
logger.debug("Query %d: %s", query.id, ex)
raise SupersetErrorException(
Expand Down
2 changes: 1 addition & 1 deletion superset/sqllab/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
MSG_FORMAT = "Failed to execute {}"

if TYPE_CHECKING:
from superset.utils.sqllab_execution_context import SqlJsonExecutionContext
from superset.sqllab.sqllab_execution_context import SqlJsonExecutionContext
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bycatch: wrong import



class SqlLabException(SupersetException):
Expand Down
1 change: 1 addition & 0 deletions superset/utils/dates.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@


def datetime_to_epoch(dttm: datetime) -> float:
"""Convert datetime to milliseconds to epoch"""
if dttm.tzinfo:
dttm = dttm.replace(tzinfo=pytz.utc)
epoch_with_tz = pytz.utc.localize(EPOCH)
Expand Down
Loading