Skip to content

Commit

Permalink
[SPARK-44889][PYTHON][CONNECT] Fix docstring of `monotonically_increa…
Browse files Browse the repository at this point in the history
…sing_id`

### What changes were proposed in this pull request?
Fix docstring of `monotonically_increasing_id`

### Why are the changes needed?
1, using `from pyspark.sql import functions as F` to avoid implicit wildcard import;
2, using dataframe APIs instead of RDD, so the docstring can be reused in Connect;

after this fix, all dostrings are reused between vanilla PySpark and Spark Connect Python Client

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
CI

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#42582 from zhengruifeng/fix_monotonically_increasing_id_docstring.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
  • Loading branch information
zhengruifeng committed Aug 21, 2023
1 parent 3ab2064 commit 72c62b6
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 6 deletions.
3 changes: 0 additions & 3 deletions python/pyspark/sql/connect/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -3901,9 +3901,6 @@ def _test() -> None:

globs = pyspark.sql.connect.functions.__dict__.copy()

# Spark Connect does not support Spark Context but the test depends on that.
del pyspark.sql.connect.functions.monotonically_increasing_id.__doc__

globs["spark"] = (
PySparkSession.builder.appName("sql.connect.functions tests")
.remote("local[4]")
Expand Down
19 changes: 16 additions & 3 deletions python/pyspark/sql/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -4312,9 +4312,22 @@ def monotonically_increasing_id() -> Column:
Examples
--------
>>> df0 = sc.parallelize(range(2), 2).mapPartitions(lambda x: [(1,), (2,), (3,)]).toDF(['col1'])
>>> df0.select(monotonically_increasing_id().alias('id')).collect()
[Row(id=0), Row(id=1), Row(id=2), Row(id=8589934592), Row(id=8589934593), Row(id=8589934594)]
>>> from pyspark.sql import functions as F
>>> spark.range(0, 10, 1, 2).select(F.monotonically_increasing_id()).show()
+-----------------------------+
|monotonically_increasing_id()|
+-----------------------------+
| 0|
| 1|
| 2|
| 3|
| 4|
| 8589934592|
| 8589934593|
| 8589934594|
| 8589934595|
| 8589934596|
+-----------------------------+
"""
return _invoke_function("monotonically_increasing_id")

Expand Down

0 comments on commit 72c62b6

Please sign in to comment.