-
Hey all! As a part of my thesis I am doing research on spark-rapids, comparing GPU and CPU processing on biological sequencing, essentially constructing De Bruijn Graphs from a large text file. The part of the code I want to accelerate is fairly simple, with the only complicated operations that are not already implemented that I require being Is there any method using UDF to implement these in a GPU accelerated way? The data I want to use both functions is
As far as I can tell,
|
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
We support UDFs (sort of). If the UDF is really simple, and can be translated into a catalyst expression, then we can do some things with that in your turn it on (very experimental). I don't think what you are doing is something we support yet for translation to catalyst. The other option we have is they you can write your own UDFs either using cuda directly or using the java cudf API. They can give you a lot of control. But it looks like your UDF is really just a join. You have an array mapping partition ids to some other number, and you want to look it up based off of that partition id. That is a join. As a side note we are working on |
Beta Was this translation helpful? Give feedback.
-
Thank you for the quick and detailed response! How about collect_set in Windowing, as I am under the impression the cudf library supports collect_set in its java api? |
Beta Was this translation helpful? Give feedback.
-
Cudf just did a core freeze for our next release, and we will be doing our own code freeze shortly. So remembering what is in previous releases gets to be a bit complicated. https://nvidia.github.io/spark-rapids/docs/supported_ops.html should list all of the operations for the current release on Apache Spark 3.0.0.
@jlowe I don't think we support UDAFs yet for RapidsUDFs. Do we? |
Beta Was this translation helpful? Give feedback.
-
@revans2 correct, UDAFs are not yet supported. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Closing this as answered. Feel free to reopen if there's more to discuss. |
Beta Was this translation helpful? Give feedback.
We support UDFs (sort of). If the UDF is really simple, and can be translated into a catalyst expression, then we can do some things with that in your turn it on (very experimental). I don't think what you are doing is something we support yet for translation to catalyst. The other option we have is they you can write your own UDFs either using cuda directly or using the java cudf API. They can give you a lot of control. But it looks like your UDF is really just a join. You have an array mapping partition ids to some other number, and you want to look it up based off of that partition id. That is a join.
As a side note we are working on
collect_list
andcollect_set
for aggregations. Proba…