-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Frameless to explode Map[A,B] #488
Conversation
def prop[A: TypedEncoder: ClassTag](xs: List[X1[Map[A, A]]]): Prop = { | ||
val tds = TypedDataset.create(xs) | ||
|
||
val framelessResults = tds.explodeMap('a).collect().run().toVector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OlivierBlanvillain @imarios I would need you help here if possible since I know nothing about shapeless :/
The compiler is complaining about
No column Symbol with shapeless.tag.Tagged[String("a")] of type scala.collection.immutable.Map[A,B] in frameless.X1[scala.collection.immutable.Map[A,A]]
Am I missing something in the definition of explodeMap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ayoub-benali, sorry for the delay, I was trying to get the new version out and then help another older PR to get merged. Let me take a look and help you out
9462828
to
27c9484
Compare
exploded | ||
// map explode explodes it into [key, value] columns | ||
// the only way to put it into a column is to create a struct | ||
// TODO: handle org.apache.spark.sql.AnalysisException: Reference 'key / value' is ambiguous, could be: key / value, key / value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ayoub-benali I pushed into your branch; one minor thing here is that we don't handle duplicate column names;
i.e.
// for the case class
case class Test(key: Int, m: Map[String, Int])
// explode function produces the following schema
// key, key, value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple ways to workaround it (i.e. rename all input columns first, perform explode, rename them back)
But I didn't have time to try that.
I also don't remember if there is a select by index / some elegant way to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, mb aliasing would not work as well due to interpreter limitations. Can be just a feature than (: I don't think there are good ways to solve that behavior in the native spark API as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a couple of changes, @ayoub-benali wating for your review / comments there!
check(forAll(prop[String, Int] _)) | ||
} | ||
|
||
test("explode on maps preserving other columns") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an extra check that verifyies that explode does not drop other columns
* case class X(i: Int, j: Map[Int, Int]) | ||
* case class Y(i: Int, j: (Int, Int)) | ||
* | ||
* val f: TypedDataset[X] = ??? | ||
* val fNew: TypedDataset[Y] = f.explodeMap('j).as[Y] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a working example
c205dc6
to
ecdd727
Compare
@ayoub-benali don't know if you still around; the review of this PR took almost a year :D but be careful! I rebased your PR on the main branch. |
Codecov Report
@@ Coverage Diff @@
## master #488 +/- ##
=======================================
Coverage 95.14% 95.14%
=======================================
Files 65 65
Lines 1134 1134
Branches 5 6 +1
=======================================
Hits 1079 1079
Misses 55 55
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
ecdd727
to
375dcc4
Compare
375dcc4
to
f487989
Compare
Thanks for the help @pomadchin ! The changes looks good for me 👍 |
Fixes #383