Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move groupby.agg logic into query compiler #1879

Closed
ienkovich opened this issue Jul 31, 2020 · 1 comment · Fixed by #1885
Closed

Move groupby.agg logic into query compiler #1879

ienkovich opened this issue Jul 31, 2020 · 1 comment · Fixed by #1885
Labels
Code Quality 💯 Improvements or issues to improve quality of codebase

Comments

@ienkovich
Copy link
Collaborator

ienkovich commented Jul 31, 2020

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 19.04
  • Modin installed from (source or binary): source
  • Modin version: master
  • Python version: 3.8.3
  • Exact command to reproduce: python test.py

Test case:

import os

os.environ["MODIN_ENGINE"] = "ray"

import modin.pandas as pd

data = {
    "a": [1, 1, 2, 2],
    "b": [11, 21, 12, 11],
}

df = pd.DataFrame(data)
ref = df.groupby("a").agg({"b": "mean"})
print(ref)

In the execution log I see

UserWarning: `DataFrame.groupby_on_multiple_columns` defaulting to pandas implementation.

Previously we could process such aggregates in OmniSci back-end, now it's defaulted to pandas in front-end. That 'breaks' OmniSci backend (processing doesn't happen in OmniSci). Don't know when degradation happened. Probably some code was moved from query compiler to front-end.

@ienkovich ienkovich added the Regression ↩️ Something that used to work but doesn't anymore label Jul 31, 2020
@devin-petersohn
Copy link
Collaborator

I am going to label this slightly differently. The patch was never upstreamed.

@devin-petersohn devin-petersohn added Code Quality 💯 Improvements or issues to improve quality of codebase and removed Regression ↩️ Something that used to work but doesn't anymore labels Jul 31, 2020
@devin-petersohn devin-petersohn changed the title [BUG] Defaulting to pandas on a simple groupby by a single column Move groupby.agg logic into query compiler Jul 31, 2020
devin-petersohn added a commit to devin-petersohn/modin that referenced this issue Jul 31, 2020
devin-petersohn added a commit to devin-petersohn/modin that referenced this issue Sep 1, 2020
YarShev pushed a commit that referenced this issue Sep 3, 2020
aregm pushed a commit to aregm/modin that referenced this issue Sep 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Quality 💯 Improvements or issues to improve quality of codebase
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants