-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ray DATA] preprocessor OneHotEncoder AttributeError: 'ClientObjectRef' object has no attribute 'exec_stats' #28262
Comments
Hey can you try this? Looks like the unresolved metadata is being passed into the metadata = [get_metadata.remote(t) for t in tables]
+ metadata = ray.get(metadata)
execution_plan = ExecutionPlan(
- BlockList(tables, ray.get(metadata), owned_by_consumer=True),
+ BlockList(tables, metadata, owned_by_consumer=True),
DatasetStats(stages={"from_arrow_refs": metadata}, parent=None),
run_by_consumer=True) |
Solved it. Thank you very much! |
Great to hear! @Alxe1 could you share more about what you're trying to do with Ray and Spark? |
All my data in hive data warehouse, and I should use spark sql to read or join tables as output, and then use spark to preprocess data such as string indexing, bucketizing, one-hot encoding and so on. I find some preprocessors in Ray are more efficient than spark, but Ray only have a few preprocessors, so I should get use of them. And most importantly, I can use Ray to train and test deep learning models in distributed mode. |
thanks that's great to hear! which preprocessors would you like us to add in Ray? |
One important preprocessor I think is continuous features K-bins discretization, and others such as TF-IDF, binarization transform also take a place. :) |
Closing this issue for now. |
What happened + What you expected to happen
I preprocess data using ray.data.preprocessors
OneHotEncoder
,I followed the normal progress:When it transforms the ray dataset, it raised an error:
The ray dataset is converted by pyspark dataframe(Because of using spark 2.4.6, I can not use raydp, so I convert pyspark dataframe myself):
The code can print the ray dataset, but can not use preprocessor to transform it. How can I deal with it?
Versions / Dependencies
ray 2.0.0
Reproduction script
In What happened.
Issue Severity
No response
The text was updated successfully, but these errors were encountered: