subquery calculated columns not supported as dimensions with sql_mode=only_full_group_by #29616

ThChatz · 2024-07-17T10:29:05Z

Bug description

Selecting a subquery calculated column as a dimension breaks charts

How to reproduce the bug

Add a subquery calculated column on a dataset
Try to use it as a dimension

Screenshots/recordings

No response

Superset version

4.0.2

Python version

Not applicable

Node version

Not applicable

Browser

Not applicable

Additional context

Simple fix would be to only use the column alias instead of the entire subquery in the generated SQL query

Checklist

I have searched Superset docs and Slack and didn't find a solution to my problem.
I have searched the GitHub issue tracker and didn't find a similar bug report.
I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

dosubot · 2024-07-17T10:33:57Z

Hey @ThChatz! I'm here to help you with any bugs, questions, or contributions you have for the repository. Let's tackle this issue together!

To address the issue where selecting a subquery calculated column as a dimension breaks charts in Superset version 4.0.2, you need to handle the sql_mode=only_full_group_by setting. This setting enforces strict SQL standards, which can cause issues when using subqueries in SELECT statements without proper GROUP BY clauses.

Here are the steps to fix this:

Disable ONLY_FULL_GROUP_BY Mode: Modify your MySQL configuration file (my.cnf or my.ini) to remove ONLY_FULL_GROUP_BY from the sql_mode setting, then restart your MySQL server.
```
[mysqld]
sql_mode = "STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
```

Modify the Query in Superset: Ensure that your subquery includes all necessary columns in the GROUP BY clause. This can be done by modifying the query generation logic in Superset to include all selected columns in the GROUP BY clause.

Example modification in the query generation logic:

if need_groupby:
    # dedup columns while preserving order
    columns = groupby or columns
    for selected in columns:
        if isinstance(selected, str):
            # if groupby field/expr equals granularity field/expr
            if selected == granularity:
                table_col = columns_by_name[selected]
                outer = table_col.get_timestamp_expression(
                    time_grain=time_grain,
                    label=selected,
                    template_processor=template_processor,
                )
            # if groupby field equals a selected column
            elif selected in columns_by_name:
                outer = self.convert_tbl_column_to_sqla_col(
                    columns_by_name[selected],
                    template_processor=template_processor,
                )
            else:
                selected = validate_adhoc_subquery(
                    selected,
                    self.database_id,
                    self.schema,
                )
                outer = literal_column(f"({selected})")
                outer = self.make_sqla_column_compatible(outer, selected)
        else:
            outer = self.adhoc_column_to_sqla(
                col=selected, template_processor=template_processor
            )
        groupby_all_columns[outer.name] = outer
        if (
            is_timeseries and not series_column_labels
        ) or outer.name in series_column_labels:
            groupby_series_columns[outer.name] = outer
        select_exprs.append(outer)

This code ensures that all selected columns are included in the GROUP BY clause, which should prevent issues related to the ONLY_FULL_GROUP_BY setting ^[1].

To continue talking to Dosu, mention @dosu.

ThChatz · 2024-07-17T12:37:36Z

Bad bot

rusackas · 2024-07-17T17:25:19Z

I do this all the time and haven't had problem. Can you give more details about this, or provide a reproducible test case using some of the example datasets?

ThChatz · 2024-07-17T19:30:03Z

Probably has to do with only_full_group_by but still the issue is visible with default datasets:
create a calculated column on the main.channel dataset with the query
(SELECT name FROM channels WHERE channels.id=channel_id)
Create a table with that column as the dimension and the default count metric

the query you get is:
SELECT (SELECT name FROM channels WHERE channels.id=channel_id) AS test, COUNT(*) AS count FROM main.channel_members GROUP BY (SELECT name FROM channels WHERE channels.id=channel_id) ORDER BY count DESC LIMIT 1000 OFFSET 0;

while a more compatible query would be
SELECT (SELECT name FROM channels WHERE channels.id=channel_id) AS test, COUNT(*) AS count FROM main.channel_members GROUP BY test ORDER BY count DESC LIMIT 1000 OFFSET 0;

dosubot bot added the change:backend Requires changing the backend label Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subquery calculated columns not supported as dimensions with sql_mode=only_full_group_by #29616

subquery calculated columns not supported as dimensions with sql_mode=only_full_group_by #29616

ThChatz commented Jul 17, 2024 •

edited

Loading

dosubot bot commented Jul 17, 2024

ThChatz commented Jul 17, 2024

rusackas commented Jul 17, 2024

ThChatz commented Jul 17, 2024

subquery calculated columns not supported as dimensions with sql_mode=only_full_group_by #29616

subquery calculated columns not supported as dimensions with sql_mode=only_full_group_by #29616

Comments

ThChatz commented Jul 17, 2024 • edited Loading

Bug description

How to reproduce the bug

Screenshots/recordings

Superset version

Python version

Node version

Browser

Additional context

Checklist

dosubot bot commented Jul 17, 2024

ThChatz commented Jul 17, 2024

rusackas commented Jul 17, 2024

ThChatz commented Jul 17, 2024

ThChatz commented Jul 17, 2024 •

edited

Loading