Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use custom druid dimensions in queries #3889

Closed
2 of 3 tasks
jcollado opened this issue Nov 16, 2017 · 1 comment
Closed
2 of 3 tasks

Unable to use custom druid dimensions in queries #3889

jcollado opened this issue Nov 16, 2017 · 1 comment
Labels
inactive Inactive for >= 30 days

Comments

@jcollado
Copy link

jcollado commented Nov 16, 2017

Make sure these boxes are checked before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if any
  • I have reproduced the issue with at least the latest released version of superset
  • I have checked the issue tracker for the same issue and I haven't found one similar

Superset version

0.20 (tried to upgrade to newer version, but superset db upgrade didn't work for me)

Expected results

Custom dimensions can be used in druid queries and get back data as usual.

Actual results

Using a custom dimension in a druid query fails with the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/superset/viz.py", line 253, in get_payload
    df = self.get_df()
  File "/usr/local/lib/python2.7/dist-packages/superset/viz.py", line 79, in get_df
    self.results = self.datasource.query(query_obj)
  File "/usr/local/lib/python2.7/dist-packages/superset/connectors/druid/models.py", line 1003, in query
    client=client, query_obj=query_obj, phase=2)
  File "/usr/local/lib/python2.7/dist-packages/superset/connectors/druid/models.py", line 799, in get_query_str
    return self.run_query(client=client, phase=phase, **query_obj)
  File "/usr/local/lib/python2.7/dist-packages/superset/connectors/druid/models.py", line 982, in run_query
    qry['dimensions'], filters)
  File "/usr/local/lib/python2.7/dist-packages/superset/connectors/druid/models.py", line 808, in _add_filter_from_pre_query_data
    f = Dimension(dim) == row[dim]
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 642, in __getitem__
    return self._get_with(key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 683, in _get_with
    return self.loc[key]
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1328, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1541, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1081, in _getitem_iterable
    self._has_valid_type(key, axis)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1418, in _has_valid_type
    (key, self.obj._get_axis_name(axis)))
KeyError: "None of [[u'outputName', u'extractionFn', u'type', u'dimension', u'outputType']] are in the [index]"

In my opinion, the problem is kind of related to the fact that superset.connectors.druid.models.DruidDatasource._add_filter_from_pre_query_data expect the dimensions parameter to be a list of strings, when it can be a list of dictionary objects when custom dimensions are used.

Following this guess, I've tried to add something like:

if isinstance(dim, dict):
    dim = dim['outputName']

but, despite the error is gone in this case, the query returns no data.

Steps to reproduce

  • Add a druid datasource to superset
  • Define in the datasource a custom dimension like, for example,
    {
      "type" : "extraction",
      "dimension": "oldDimension",
      "outputName": "newDimension",
      "outputType": "STRING",
      "extractionFn": {
        "type": "javascript",
        "function": "function(value) { return value.split('-', 1)[0]; }"
      }
    }
  • In the explore screen, generate a query using the custom dimension as the "group by" criteria and "count" as the metric. The query will return the error above.

Surprisingly, if two custom dimensions are used in the "group by" the query works using the "table view", visualization, but changing it to "time series - stacked" fails with the same error and stack trace.

@jcollado jcollado changed the title Unable to use custom druid dimensions Unable to use custom druid dimensions in queries Nov 16, 2017
@stale
Copy link

stale bot commented Apr 11, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Apr 11, 2019
@stale stale bot closed this as completed Apr 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive Inactive for >= 30 days
Projects
None yet
Development

No branches or pull requests

1 participant