-
Notifications
You must be signed in to change notification settings - Fork 28.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-48247][PYTHON] Use all values in a dict when inferring MapType…
… schema ### What changes were proposed in this pull request? This is similar with #36545. This PR proposes to infer the map types from all pairs instead of the first pair. ### Why are the changes needed? To have the consistent behaivor. e.g., ```python >>> spark.createDataFrame([[1], [2], ["a"], ["c"]]).collect() [Row(_1='1'), Row(_1='2'), Row(_1='a'), Row(_1='c')] ``` ### Does this PR introduce _any_ user-facing change? Yes. See below **Without Spark Connect:** ```python >>> spark.createDataFrame([{"outer": {"payment": 200.5, "name": "A"}}]).collect() [Row(outer={'name': 'A', 'payment': '200.5'})] >>> spark.conf.set("spark.sql.pyspark.legacy.inferMapTypeFromFirstPair.enabled", True) >>> spark.createDataFrame([{"outer": {"payment": 200.5, "name": "A"}}]).collect() [Row(outer={'name': None, 'payment': 200.5})] ``` **With Spark Conenct:** ```python >>> spark.createDataFrame([{"outer": {"payment": 200.5, "name": "A"}}]).collect() [Row(outer={'payment': '200.5', 'name': 'A'})] >>> spark.conf.set("spark.sql.pyspark.legacy.inferMapTypeFromFirstPair.enabled", True) >>> spark.createDataFrame([{"outer": {"payment": 200.5, "name": "A"}}]).collect() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../spark/python/pyspark/sql/connect/session.py", line 635, in createDataFrame _table = LocalDataToArrowConversion.convert(_data, _schema) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/.../spark/python/pyspark/sql/connect/conversion.py", line 378, in convert return pa.Table.from_arrays(pylist, schema=pa_schema) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/table.pxi", line 3974, in pyarrow.lib.Table.from_arrays File "pyarrow/table.pxi", line 1464, in pyarrow.lib._sanitize_arrays File "pyarrow/array.pxi", line 373, in pyarrow.lib.asarray File "pyarrow/array.pxi", line 343, in pyarrow.lib.array File "pyarrow/array.pxi", line 42, in pyarrow.lib._sequence_to_array File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Could not convert 'A' with type str: tried to convert to double ``` ### How was this patch tested? Unittests added ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46547 from HyukjinKwon/infer-map-first. Lead-authored-by: Hyukjin Kwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
- Loading branch information
1 parent
79aeae1
commit 42c1c8f
Showing
7 changed files
with
116 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters