Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subquery calculated columns not supported as dimensions with sql_mode=only_full_group_by #29616

Open
3 tasks done
ThChatz opened this issue Jul 17, 2024 · 4 comments
Open
3 tasks done
Labels
change:backend Requires changing the backend

Comments

@ThChatz
Copy link

ThChatz commented Jul 17, 2024

Bug description

Selecting a subquery calculated column as a dimension breaks charts

How to reproduce the bug

Add a subquery calculated column on a dataset
Try to use it as a dimension

Screenshots/recordings

No response

Superset version

4.0.2

Python version

Not applicable

Node version

Not applicable

Browser

Not applicable

Additional context

Simple fix would be to only use the column alias instead of the entire subquery in the generated SQL query

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
@dosubot dosubot bot added the change:backend Requires changing the backend label Jul 17, 2024
Copy link

dosubot bot commented Jul 17, 2024

Hey @ThChatz! I'm here to help you with any bugs, questions, or contributions you have for the repository. Let's tackle this issue together!

To address the issue where selecting a subquery calculated column as a dimension breaks charts in Superset version 4.0.2, you need to handle the sql_mode=only_full_group_by setting. This setting enforces strict SQL standards, which can cause issues when using subqueries in SELECT statements without proper GROUP BY clauses.

Here are the steps to fix this:

  1. Disable ONLY_FULL_GROUP_BY Mode: Modify your MySQL configuration file (my.cnf or my.ini) to remove ONLY_FULL_GROUP_BY from the sql_mode setting, then restart your MySQL server.

    [mysqld]
    sql_mode = "STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
  2. Modify the Query in Superset: Ensure that your subquery includes all necessary columns in the GROUP BY clause. This can be done by modifying the query generation logic in Superset to include all selected columns in the GROUP BY clause.

    Example modification in the query generation logic:

    if need_groupby:
        # dedup columns while preserving order
        columns = groupby or columns
        for selected in columns:
            if isinstance(selected, str):
                # if groupby field/expr equals granularity field/expr
                if selected == granularity:
                    table_col = columns_by_name[selected]
                    outer = table_col.get_timestamp_expression(
                        time_grain=time_grain,
                        label=selected,
                        template_processor=template_processor,
                    )
                # if groupby field equals a selected column
                elif selected in columns_by_name:
                    outer = self.convert_tbl_column_to_sqla_col(
                        columns_by_name[selected],
                        template_processor=template_processor,
                    )
                else:
                    selected = validate_adhoc_subquery(
                        selected,
                        self.database_id,
                        self.schema,
                    )
                    outer = literal_column(f"({selected})")
                    outer = self.make_sqla_column_compatible(outer, selected)
            else:
                outer = self.adhoc_column_to_sqla(
                    col=selected, template_processor=template_processor
                )
            groupby_all_columns[outer.name] = outer
            if (
                is_timeseries and not series_column_labels
            ) or outer.name in series_column_labels:
                groupby_series_columns[outer.name] = outer
            select_exprs.append(outer)

This code ensures that all selected columns are included in the GROUP BY clause, which should prevent issues related to the ONLY_FULL_GROUP_BY setting [1].

To continue talking to Dosu, mention @dosu.

@ThChatz
Copy link
Author

ThChatz commented Jul 17, 2024

Bad bot

@rusackas
Copy link
Member

I do this all the time and haven't had problem. Can you give more details about this, or provide a reproducible test case using some of the example datasets?

@ThChatz
Copy link
Author

ThChatz commented Jul 17, 2024

Probably has to do with only_full_group_by but still the issue is visible with default datasets:
create a calculated column on the main.channel dataset with the query
(SELECT name FROM channels WHERE channels.id=channel_id)
Create a table with that column as the dimension and the default count metric

the query you get is:
SELECT (SELECT name FROM channels WHERE channels.id=channel_id) AS test, COUNT(*) AS count FROM main.channel_members GROUP BY (SELECT name FROM channels WHERE channels.id=channel_id) ORDER BY count DESC LIMIT 1000 OFFSET 0;

while a more compatible query would be
SELECT (SELECT name FROM channels WHERE channels.id=channel_id) AS test, COUNT(*) AS count FROM main.channel_members GROUP BY test ORDER BY count DESC LIMIT 1000 OFFSET 0;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:backend Requires changing the backend
Projects
None yet
Development

No branches or pull requests

2 participants