{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":17165658,"defaultBranch":"master","name":"spark","ownerLogin":"apache","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2014-02-25T08:00:08.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/47359?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1726829072.0","currentOid":""},"activityList":{"items":[{"before":"e693e18c7d0e9a495dcb6e9b31dac9ce2b98428c","after":"e7ca790ed4f0b4d7c19d849b00a23474c391b79f","ref":"refs/heads/branch-3.5","pushedAt":"2024-09-20T09:56:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HeartSaVioR","name":"Jungtaek Lim","path":"/HeartSaVioR","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1317309?s=80&v=4"},"commit":{"message":"[SPARK-49699][SS] Disable PruneFilters for streaming workloads\n\nThe PR proposes to disable PruneFilters if the predicate of the filter is evaluated to `null` / `false` and the filter (and subtree) is streaming.\n\nPruneFilters replaces the `null` / `false` filter with an empty relation, which means the subtree of the filter is also lost. The optimization does not care about whichever operator is in the subtree, hence some important operators like stateful operator, watermark node, observe node could be lost.\n\nThe filter could be evaluated to `null` / `false` selectively among microbatches in various reasons (one simple example is the modification of the query during restart), which means stateful operator might not be available for batch N and be available for batch N + 1. For this case, streaming query will fail as batch N + 1 cannot load the state from batch N, and it's not recoverable in most cases.\n\nSee new tests in StreamingQueryOptimizationCorrectnessSuite for details.\n\nNo.\n\nUT.\n\nNo.\n\nCloses #48149 from n-young-db/n-young-db/disable-streaming-prune-filters.\n\nLead-authored-by: Nick Young \nCo-authored-by: Jungtaek Lim \nSigned-off-by: Jungtaek Lim \n(cherry picked from commit 46b0210edb4ef8490ee4bbc4a40baf202a531b33)\nSigned-off-by: Jungtaek Lim ","shortMessageHtmlLink":"[SPARK-49699][SS] Disable PruneFilters for streaming workloads"}},{"before":"b37863d2327131c670fe791576a907bcb5243cd6","after":"46b0210edb4ef8490ee4bbc4a40baf202a531b33","ref":"refs/heads/master","pushedAt":"2024-09-20T09:05:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HeartSaVioR","name":"Jungtaek Lim","path":"/HeartSaVioR","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1317309?s=80&v=4"},"commit":{"message":"[SPARK-49699][SS] Disable PruneFilters for streaming workloads\n\n### What changes were proposed in this pull request?\n\nThe PR proposes to disable PruneFilters if the predicate of the filter is evaluated to `null` / `false` and the filter (and subtree) is streaming.\n\n### Why are the changes needed?\n\nPruneFilters replaces the `null` / `false` filter with an empty relation, which means the subtree of the filter is also lost. The optimization does not care about whichever operator is in the subtree, hence some important operators like stateful operator, watermark node, observe node could be lost.\n\nThe filter could be evaluated to `null` / `false` selectively among microbatches in various reasons (one simple example is the modification of the query during restart), which means stateful operator might not be available for batch N and be available for batch N + 1. For this case, streaming query will fail as batch N + 1 cannot load the state from batch N, and it's not recoverable in most cases.\n\nSee new tests in StreamingQueryOptimizationCorrectnessSuite for details.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nUT.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #48149 from n-young-db/n-young-db/disable-streaming-prune-filters.\n\nLead-authored-by: Nick Young \nCo-authored-by: Jungtaek Lim \nSigned-off-by: Jungtaek Lim ","shortMessageHtmlLink":"[SPARK-49699][SS] Disable PruneFilters for streaming workloads"}},{"before":"c009cd061c4923955a1e7ec9bf6c045f93d27ef7","after":"b37863d2327131c670fe791576a907bcb5243cd6","ref":"refs/heads/master","pushedAt":"2024-09-20T07:40:39.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[MINOR][FOLLOWUP] Fix rat check for .nojekyll\n\n### What changes were proposed in this pull request?\n\nFix rat check for .nojekyll\n\n### Why are the changes needed?\n\nCI fix\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\ndev/check-license\nIgnored 1 lines in your exclusion files as comments or empty lines.\nRAT checks passed.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #48178 from yaooqinn/f.\n\nAuthored-by: Kent Yao \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[MINOR][FOLLOWUP] Fix rat check for .nojekyll"}},{"before":"6352c12f607bc092c33f1f29174d6699f8312380","after":"c009cd061c4923955a1e7ec9bf6c045f93d27ef7","ref":"refs/heads/master","pushedAt":"2024-09-20T07:16:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"MaxGekk","name":"Maxim Gekk","path":"/MaxGekk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1580697?s=80&v=4"},"commit":{"message":"[SPARK-49392][SQL][FOLLOWUP] Catch errors when failing to write to external data source\n\n### What changes were proposed in this pull request?\nChange `sqlState` to KD010.\n\n### Why are the changes needed?\nNecessary modification for the Databricks error class space.\n\n### Does this PR introduce _any_ user-facing change?\nYes, the new error message is now updated to KD010.\n\n### How was this patch tested?\nExisting tests (updated).\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #48165 from uros-db/external-data-source-fix.\n\nAuthored-by: Uros Bojanic \nSigned-off-by: Max Gekk ","shortMessageHtmlLink":"[SPARK-49392][SQL][FOLLOWUP] Catch errors when failing to write to ex…"}},{"before":"d4665fa1df716305acb49912d41c396b39343c93","after":"6352c12f607bc092c33f1f29174d6699f8312380","ref":"refs/heads/master","pushedAt":"2024-09-20T06:29:11.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[MINOR][INFRA] Disable 'pages build and deployment' action\n\n### What changes were proposed in this pull request?\n\nDisable https://github.com/apache/spark/actions/runs/10951008649/ via:\n\n> adding a .nojekyll file to the root of your source branch will bypass the Jekyll build process and deploy the content directly.\n\nhttps://docs.github.com/en/pages/quickstart\n\n### Why are the changes needed?\n\nrestore ci\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n### How was this patch tested?\n\nno\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #48176 from yaooqinn/action.\n\nAuthored-by: Kent Yao \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[MINOR][INFRA] Disable 'pages build and deployment' action"}},{"before":"a5ac80af8e94afe56105c265a94d02ef878e1de9","after":"d4665fa1df716305acb49912d41c396b39343c93","ref":"refs/heads/master","pushedAt":"2024-09-20T05:11:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HeartSaVioR","name":"Jungtaek Lim","path":"/HeartSaVioR","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1317309?s=80&v=4"},"commit":{"message":"[SPARK-49677][SS] Ensure that changelog files are written on commit and forceSnapshot flag is also reset\n\n### What changes were proposed in this pull request?\nEnsure that changelog files are written on commit and forceSnapshot flag is also reset\n\n### Why are the changes needed?\nWithout these changes, we are not writing the changelog files per batch and we are also trying to upload full snapshot each time since the flag is not being reset correctly\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nAdded unit tests\n\nBefore:\n```\n[info] Run completed in 3 seconds, 438 milliseconds.\n[info] Total number of tests run: 1\n[info] Suites: completed 1, aborted 0\n[info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0\n[info] *** 1 TEST FAILED ***\n```\n\nAfter:\n```\n[info] Run completed in 4 seconds, 155 milliseconds.\n[info] Total number of tests run: 1\n[info] Suites: completed 1, aborted 0\n[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0\n[info] All tests passed.\n```\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #48125 from anishshri-db/task/SPARK-49677.\n\nAuthored-by: Anish Shrigondekar \nSigned-off-by: Jungtaek Lim ","shortMessageHtmlLink":"[SPARK-49677][SS] Ensure that changelog files are written on commit a…"}},{"before":"ca726c10925a3677bf057f65ecf415e608c63cd5","after":"a5ac80af8e94afe56105c265a94d02ef878e1de9","ref":"refs/heads/master","pushedAt":"2024-09-20T00:29:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-49713][PYTHON][CONNECT] Make function `count_min_sketch` accept number arguments\n\n### What changes were proposed in this pull request?\n1, Make function `count_min_sketch` accept number arguments;\n2, Make argument `seed` optional;\n3, fix the type hints of `eps/confidence/seed` from `ColumnOrName` to `Column`, because they require a foldable value and actually do not accept column name:\n```\nIn [3]: from pyspark.sql import functions as sf\n\nIn [4]: df = spark.range(10000).withColumn(\"seed\", sf.lit(1).cast(\"int\"))\n\nIn [5]: df.select(sf.hex(sf.count_min_sketch(\"id\", sf.lit(0.5), sf.lit(0.5), \"seed\")))\n...\nAnalysisException: [DATATYPE_MISMATCH.NON_FOLDABLE_INPUT] Cannot resolve \"count_min_sketch(id, 0.5, 0.5, seed)\" due to data type mismatch: the input `seed` should be a foldable \"INT\" expression; however, got \"seed\". SQLSTATE: 42K09;\n'Aggregate [unresolvedalias('hex(count_min_sketch(id#1L, 0.5, 0.5, seed#2, 0, 0)))]\n+- Project [id#1L, cast(1 as int) AS seed#2]\n +- Range (0, 10000, step=1, splits=Some(12))\n...\n```\n\n### Why are the changes needed?\n1, seed is optional in other similar functions;\n2, existing type hint is `ColumnOrName` which is misleading since column name is not actually supported\n\n### Does this PR introduce _any_ user-facing change?\nyes, it support number arguments\n\n### How was this patch tested?\nupdated doctests\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #48157 from zhengruifeng/py_fix_count_min_sketch.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Ruifeng Zheng ","shortMessageHtmlLink":"[SPARK-49713][PYTHON][CONNECT] Make function count_min_sketch accep…"}},{"before":"04455797bfb3631b13b41cfa5d2604db3bf8acc2","after":"ca726c10925a3677bf057f65ecf415e608c63cd5","ref":"refs/heads/master","pushedAt":"2024-09-20T00:16:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-49721][BUILD] Upgrade `protobuf-java` to 3.25.5\n\n### What changes were proposed in this pull request?\n\nThis PR aims to upgrade `protobuf-java` to 3.25.5.\n\n### Why are the changes needed?\n\nTo bring the latest bug fixes.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPass the CIs.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #48170\n\nCloses #48171 from dongjoon-hyun/SPARK-49721.\n\nAuthored-by: Dongjoon Hyun \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-49721][BUILD] Upgrade protobuf-java to 3.25.5"}},{"before":"1142417f6a9f4cd646c880f099ce2e6e61225e0c","after":null,"ref":"refs/heads/dependabot/maven/com.google.protobuf-protobuf-java-3.25.5","pushedAt":"2024-09-19T23:37:21.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"}},{"before":"6d1815eceea2003de2e3602f0f64e8188e8288d8","after":"04455797bfb3631b13b41cfa5d2604db3bf8acc2","ref":"refs/heads/master","pushedAt":"2024-09-19T19:32:33.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-49720][PYTHON][INFRA] Add a script to clean up PySpark temp files\n\n### What changes were proposed in this pull request?\nAdd a script to clean up PySpark temp files\n\n### Why are the changes needed?\nSometimes I encounter weird issues due to the out-dated `pyspark.zip` file, and removing it can result in expected behavior.\nSo I think we can add such a script.\n\n### Does this PR introduce _any_ user-facing change?\nno, dev-only\n\n### How was this patch tested?\nmanually test\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #48167 from zhengruifeng/py_infra_cleanup.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-49720][PYTHON][INFRA] Add a script to clean up PySpark temp files"}},{"before":"92cad2abd54e775259dc36d2f90242460d72a174","after":"6d1815eceea2003de2e3602f0f64e8188e8288d8","ref":"refs/heads/master","pushedAt":"2024-09-19T19:31:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-49718][PS] Switch `Scatter` plot to sampled data\n\n### What changes were proposed in this pull request?\nSwitch `Scatter` plot to sampled data\n\n### Why are the changes needed?\nwhen the data distribution has relationship with the order, the first n rows will not be representative of the whole dataset\n\nfor example:\n```\nimport pandas as pd\nimport numpy as np\nimport pyspark.pandas as ps\n\n# ps.set_option(\"plotting.max_rows\", 10000)\nnp.random.seed(123)\n\npdf = pd.DataFrame(np.random.randn(10000, 4), columns=list('ABCD')).sort_values(\"A\")\npsdf = ps.DataFrame(pdf)\n\npsdf.plot.scatter(x='B', y='A')\n```\n\nall 10k datapoints:\n![image](https://github.com/user-attachments/assets/72cf7e97-ad10-41e0-a8a6-351747d5285f)\n\nbefore (first 1k datapoints):\n![image](https://github.com/user-attachments/assets/1ed50d2c-7772-4579-a84c-6062542d9367)\n\nafter (sampled 1k datapoints):\n![image](https://github.com/user-attachments/assets/6c684cba-4119-4c38-8228-2bedcdeb9e59)\n\n### Does this PR introduce _any_ user-facing change?\nyes\n\n### How was this patch tested?\nci and manually test\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #48164 from zhengruifeng/ps_scatter_sampling.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-49718][PS] Switch Scatter plot to sampled data"}},{"before":"2c06ef1a49d8c81bdc1b880d7b0e8319186c2004","after":"cb89d18a4d750fc88e5d747601352488223e97b5","ref":"refs/heads/branch-3.4","pushedAt":"2024-09-19T19:19:16.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-46535][SQL][3.4] Fix NPE when describe extended a column without col stats\n\n### What changes were proposed in this pull request?\n\nBackport [#44524 ] to 3.4 for [[SPARK-46535]](https://issues.apache.org/jira/browse/SPARK-46535)[SQL] Fix NPE when describe extended a column without col stats\n\n### Why are the changes needed?\n\nCurrently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception.\n\n```\nCannot invoke \"org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()\" because the return value of \"scala.Option.get()\" is null\njava.lang.NullPointerException: Cannot invoke \"org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()\" because the return value of \"scala.Option.get()\" is null\n\tat org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63)\n\tat org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)\n\tat org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)\n\tat org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)\n\tat org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)\n\tat org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)\n```\n\n### Does this PR introduce _any_ user-facing change?\n\n### How was this patch tested?\n\nAdd a new test describe extended (formatted) a column without col stats\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #48160 from saitharun15/SPARK-46535-branch-3.4.\n\nLead-authored-by: saitharun15 \nCo-authored-by: Sai Tharun \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-46535][SQL][3.4] Fix NPE when describe extended a column witho…"}},{"before":"373928082d01850abf6f503f7dec7ecaa6845ade","after":null,"ref":"refs/heads/dependabot/bundler/docs/google-protobuf-3.25.5","pushedAt":"2024-09-19T17:21:51.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"}},{"before":"f0fb0c89ec29b587569d68a824c4ce7543721c06","after":"92cad2abd54e775259dc36d2f90242460d72a174","ref":"refs/heads/master","pushedAt":"2024-09-19T17:09:38.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-49716][PS][DOCS][TESTS] Fix documentation and add test of barh plot\n\n### What changes were proposed in this pull request?\n- Update the documentation for barh plot to clarify the difference between axis interpretation in Plotly and Matplotlib.\n- Test multiple columns as value axis.\n\nThe parameter difference is demonstrated as below.\n```py\n>>> df = ps.DataFrame({'lab': ['A', 'B', 'C'], 'val': [10, 30, 20]})\n>>> df.plot.barh(x='val', y='lab').show() # plot1\n\n>>> ps.set_option('plotting.backend', 'matplotlib')\n>>> import matplotlib.pyplot as plt\n>>> df.plot.barh(x='lab', y='val')\n>>> plt.show() # plot2\n```\n\nplot1\n![newplot (5)](https://github.com/user-attachments/assets/f1b6fabe-9509-41bb-8cfb-0733f65f1643)\n\nplot2\n![Figure_1](https://github.com/user-attachments/assets/10e1b65f-6116-4490-9956-29e1fbf0c053)\n\n### Why are the changes needed?\nThe barh plot’s x and y axis behavior differs between Plotly and Matplotlib, which may confuse users. The updated documentation and tests help ensure clarity and prevent misinterpretation.\n\n### Does this PR introduce _any_ user-facing change?\nNo. Doc change only.\n\n### How was this patch tested?\nUnit tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #48161 from xinrong-meng/ps_barh.\n\nAuthored-by: Xinrong Meng \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-49716][PS][DOCS][TESTS] Fix documentation and add test of barh…"}},{"before":"94dca78c128ff3d1571326629b4100ee092afb54","after":"f0fb0c89ec29b587569d68a824c4ce7543721c06","ref":"refs/heads/master","pushedAt":"2024-09-19T17:06:49.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-49719][SQL] Make `UUID` and `SHUFFLE` accept integer `seed`\n\n### What changes were proposed in this pull request?\nMake `UUID` and `SHUFFLE` accept integer `seed`\n\n### Why are the changes needed?\nIn most cases, `seed` accept both int and long, but `UUID` and `SHUFFLE` only accept long seed\n\n```py\nIn [1]: spark.sql(\"SELECT RAND(1L), RAND(1), SHUFFLE(array(1, 20, 3, 5), 1L), UUID(1L)\").show()\n+------------------+------------------+---------------------------+--------------------+\n| rand(1)| rand(1)|shuffle(array(1, 20, 3, 5))| uuid()|\n+------------------+------------------+---------------------------+--------------------+\n|0.6363787615254752|0.6363787615254752| [20, 1, 3, 5]|1ced31d7-59ef-4bb...|\n+------------------+------------------+---------------------------+--------------------+\n\nIn [2]: spark.sql(\"SELECT UUID(1)\").show()\n...\nAnalysisException: [INVALID_PARAMETER_VALUE.LONG] The value of parameter(s) `seed` in `UUID` is invalid: expects a long literal, but got \"1\". SQLSTATE: 22023; line 1 pos 7\n...\n\nIn [3]: spark.sql(\"SELECT SHUFFLE(array(1, 20, 3, 5), 1)\").show()\n...\nAnalysisException: [INVALID_PARAMETER_VALUE.LONG] The value of parameter(s) `seed` in `shuffle` is invalid: expects a long literal, but got \"1\". SQLSTATE: 22023; line 1 pos 7\n...\n```\n\n### Does this PR introduce _any_ user-facing change?\nyes\n\nafter this fix:\n```py\nIn [2]: spark.sql(\"SELECT SHUFFLE(array(1, 20, 3, 5), 1L), SHUFFLE(array(1, 20, 3, 5), 1), UUID(1L), UUID(1)\").show()\n+---------------------------+---------------------------+--------------------+--------------------+\n|shuffle(array(1, 20, 3, 5))|shuffle(array(1, 20, 3, 5))| uuid()| uuid()|\n+---------------------------+---------------------------+--------------------+--------------------+\n| [20, 1, 3, 5]| [20, 1, 3, 5]|1ced31d7-59ef-4bb...|1ced31d7-59ef-4bb...|\n+---------------------------+---------------------------+--------------------+--------------------+\n```\n\n### How was this patch tested?\nadded tests\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #48166 from zhengruifeng/int_seed.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-49719][SQL] Make UUID and SHUFFLE accept integer seed"}},{"before":null,"after":"1142417f6a9f4cd646c880f099ce2e6e61225e0c","ref":"refs/heads/dependabot/maven/com.google.protobuf-protobuf-java-3.25.5","pushedAt":"2024-09-19T16:27:23.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"},"commit":{"message":"Bump com.google.protobuf:protobuf-java from 3.25.4 to 3.25.5\n\nBumps [com.google.protobuf:protobuf-java](https://github.com/protocolbuffers/protobuf) from 3.25.4 to 3.25.5.\n- [Release notes](https://github.com/protocolbuffers/protobuf/releases)\n- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl)\n- [Commits](https://github.com/protocolbuffers/protobuf/compare/v3.25.4...v3.25.5)\n\n---\nupdated-dependencies:\n- dependency-name: com.google.protobuf:protobuf-java\n dependency-type: direct:production\n...\n\nSigned-off-by: dependabot[bot] ","shortMessageHtmlLink":"Bump com.google.protobuf:protobuf-java from 3.25.4 to 3.25.5"}},{"before":null,"after":"373928082d01850abf6f503f7dec7ecaa6845ade","ref":"refs/heads/dependabot/bundler/docs/google-protobuf-3.25.5","pushedAt":"2024-09-19T16:26:29.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"},"commit":{"message":"Bump google-protobuf from 3.25.3 to 3.25.5 in /docs\n\nBumps [google-protobuf](https://github.com/protocolbuffers/protobuf) from 3.25.3 to 3.25.5.\n- [Release notes](https://github.com/protocolbuffers/protobuf/releases)\n- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl)\n- [Commits](https://github.com/protocolbuffers/protobuf/compare/v3.25.3...v3.25.5)\n\n---\nupdated-dependencies:\n- dependency-name: google-protobuf\n dependency-type: indirect\n...\n\nSigned-off-by: dependabot[bot] ","shortMessageHtmlLink":"Bump google-protobuf from 3.25.3 to 3.25.5 in /docs"}},{"before":"398457af59875120ea8b3ed44468a51597e6a441","after":"94dca78c128ff3d1571326629b4100ee092afb54","ref":"refs/heads/master","pushedAt":"2024-09-19T13:11:00.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-49693][PYTHON][CONNECT] Refine the string representation of `timedelta`\n\n### What changes were proposed in this pull request?\nRefine the string representation of `timedelta`, by following the ISO format.\nNote that the used units in JVM side (`Duration`) and Pandas are different.\n\n### Why are the changes needed?\nWe should not leak the raw data\n\n### Does this PR introduce _any_ user-facing change?\nyes\n\nPySpark Classic:\n```\nIn [1]: from pyspark.sql import functions as sf\n\nIn [2]: import datetime\n\nIn [3]: sf.lit(datetime.timedelta(1, 1))\nOut[3]: Column<'PT24H1S'>\n```\n\nPySpark Connect (before):\n```\nIn [1]: from pyspark.sql import functions as sf\n\nIn [2]: import datetime\n\nIn [3]: sf.lit(datetime.timedelta(1, 1))\nOut[3]: Column<'86401000000'>\n```\n\nPySpark Connect (after):\n```\nIn [1]: from pyspark.sql import functions as sf\n\nIn [2]: import datetime\n\nIn [3]: sf.lit(datetime.timedelta(1, 1))\nOut[3]: Column<'P1DT0H0M1S'>\n```\n\n### How was this patch tested?\nadded test\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #48159 from zhengruifeng/pc_lit_delta.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Ruifeng Zheng ","shortMessageHtmlLink":"[SPARK-49693][PYTHON][CONNECT] Refine the string representation of `t…"}},{"before":"4068fbcc0de59154db9bdeb1296bd24059db9f42","after":"398457af59875120ea8b3ed44468a51597e6a441","ref":"refs/heads/master","pushedAt":"2024-09-19T13:02:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"[SPARK-49422][CONNECT][SQL] Add groupByKey to sql/api\n\n### What changes were proposed in this pull request?\nThis PR adds `Dataset.groupByKey(..)` to the shared interface. I forgot to add in the previous PR.\n\n### Why are the changes needed?\nThe shared interface needs to support all functionality.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #48147 from hvanhovell/SPARK-49422-follow-up.\n\nAuthored-by: Herman van Hovell \nSigned-off-by: Herman van Hovell ","shortMessageHtmlLink":"[SPARK-49422][CONNECT][SQL] Add groupByKey to sql/api"}},{"before":"a060c236d314bd2facc73ad26926b59401e5f7aa","after":"4068fbcc0de59154db9bdeb1296bd24059db9f42","ref":"refs/heads/master","pushedAt":"2024-09-19T13:01:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-49717][SQL][TESTS] Function parity test ignore private[xxx] functions\n\n### What changes were proposed in this pull request?\nFunction parity test ignore private functions\n\n### Why are the changes needed?\nexisting test is based on `java.lang.reflect.Modifier` which cannot properly handle `private[xxx]`\n\n### Does this PR introduce _any_ user-facing change?\nno, test only\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #48163 from zhengruifeng/df_func_test.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Ruifeng Zheng ","shortMessageHtmlLink":"[SPARK-49717][SQL][TESTS] Function parity test ignore private[xxx] fu…"}},{"before":"ac34f1de92c6f5cb53d799f00e550a0a204d9eb2","after":"a060c236d314bd2facc73ad26926b59401e5f7aa","ref":"refs/heads/master","pushedAt":"2024-09-19T12:25:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-49667][SQL] Disallowed CS_AI collators with expressions that use StringSearch\n\n### What changes were proposed in this pull request?\n\nIn this PR, I propose to disallow `CS_AI` collated strings in expressions that use `StringsSearch` in their implementation. These expressions are `trim`, `startswith`, `endswith`, `locate`, `instr`, `str_to_map`, `contains`, `replace`, `split_part` and `substring_index`.\n\nCurrently, these expressions support all possible collations, however, they do not work properly with `CS_AI` collators. This is because there is no support for `CS_AI` search in the ICU's `StringSearch` class which is used to implement these expressions. Therefore, the expressions are not behaving correctly when used with `CS_AI` collators (e.g. currently `startswith('hOtEl' collate unicode_ai, 'Hotel' collate unicode_ai)` returns `true`).\n\n### Why are the changes needed?\n\nProposed changes are necessary in order to achieve correct behavior of the expressions mentioned above.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nThis patch was tested by adding a test in the `CollationSuite`.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #48121 from vladanvasi-db/vladanvasi-db/cs-ai-collations-expressions-disablement.\n\nAuthored-by: Vladan Vasić \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-49667][SQL] Disallowed CS_AI collators with expressions that u…"}},{"before":"492d1b14c0d19fa89b9ce9c0e48fc0e4c120b70c","after":"ac34f1de92c6f5cb53d799f00e550a0a204d9eb2","ref":"refs/heads/master","pushedAt":"2024-09-19T09:56:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48280][SQL][FOLLOW-UP] Add expressions that are built via expressionBuilder to Expression Walker\n\n### What changes were proposed in this pull request?\nAddition of new expressions to expression walker. This PR also improves descriptions of methods in the Suite.\n\n### Why are the changes needed?\nIt was noticed while debugging that startsWith, endsWith and contains are not tested with this suite and these expressions represent core of collation testing.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nTest only.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #48162 from mihailom-db/expressionwalkerfollowup.\n\nAuthored-by: Mihailo Milosevic \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48280][SQL][FOLLOW-UP] Add expressions that are built via expr…"}},{"before":"3bdf146bbee58d207afaadc92024d9f6c4b941dd","after":"492d1b14c0d19fa89b9ce9c0e48fc0e4c120b70c","ref":"refs/heads/master","pushedAt":"2024-09-19T09:09:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48782][SQL] Add support for executing procedures in catalogs\n\n### What changes were proposed in this pull request?\n\nThis PR adds support for executing procedures in catalogs.\n\n### Why are the changes needed?\n\nThese changes are needed per [discussed and voted](https://lists.apache.org/thread/w586jr53fxwk4pt9m94b413xyjr1v25m) SPIP tracked in [SPARK-44167](https://issues.apache.org/jira/browse/SPARK-44167).\n\n### Does this PR introduce _any_ user-facing change?\n\nYes. This PR adds CALL commands.\n\n### How was this patch tested?\n\nThis PR comes with tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47943 from aokolnychyi/spark-48782.\n\nAuthored-by: Anton Okolnychyi \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48782][SQL] Add support for executing procedures in catalogs"}},{"before":"f3c8d26eb0c3fd7f77950eb08c70bb2a9ab6493c","after":"3bdf146bbee58d207afaadc92024d9f6c4b941dd","ref":"refs/heads/master","pushedAt":"2024-09-19T07:27:42.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"MaxGekk","name":"Maxim Gekk","path":"/MaxGekk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1580697?s=80&v=4"},"commit":{"message":"[SPARK-49611][SQL][FOLLOW-UP] Fix wrong results of collations() TVF\n\n### What changes were proposed in this pull request?\nFix of accent sensitive and case sensitive column results.\n\n### Why are the changes needed?\nWhen initial PR was introduced, ICU collation listing ended up with different order of generating columns so results were wrong.\n\n### Does this PR introduce _any_ user-facing change?\nNo, as spark 4.0 was not released yet.\n\n### How was this patch tested?\nExisting test in CollationSuite.scala, which was wrong in the first place.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #48152 from mihailom-db/tvf-collations-followup.\n\nAuthored-by: Mihailo Milosevic \nSigned-off-by: Max Gekk ","shortMessageHtmlLink":"[SPARK-49611][SQL][FOLLOW-UP] Fix wrong results of collations() TVF"}},{"before":"8861f0f9af3f397921ba1204cf4f76f4e20680bb","after":"f3c8d26eb0c3fd7f77950eb08c70bb2a9ab6493c","ref":"refs/heads/master","pushedAt":"2024-09-19T01:36:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"Revert \"[SPARK-49422][CONNECT][SQL] Add groupByKey to sql/api\"\n\nThis reverts commit af45902d33c4d8e38a6427ac1d0c46fe057bb45a.","shortMessageHtmlLink":"Revert \"[SPARK-49422][CONNECT][SQL] Add groupByKey to sql/api\""}},{"before":"af45902d33c4d8e38a6427ac1d0c46fe057bb45a","after":"8861f0f9af3f397921ba1204cf4f76f4e20680bb","ref":"refs/heads/master","pushedAt":"2024-09-19T00:16:40.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"Revert \"[SPARK-49495][DOCS] Document and Feature Preview on the master branch via Live GitHub Pages Updates\"\n\nThis reverts commit b1807095bef9c6d98e60bdc2669c8af93bc68ad4.","shortMessageHtmlLink":"Revert \"[SPARK-49495][DOCS] Document and Feature Preview on the maste…"}},{"before":"3b34891e5b9c2694b7ffdc265290e25847dc3437","after":"af45902d33c4d8e38a6427ac1d0c46fe057bb45a","ref":"refs/heads/master","pushedAt":"2024-09-19T00:11:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"[SPARK-49422][CONNECT][SQL] Add groupByKey to sql/api\n\n### What changes were proposed in this pull request?\nThis PR adds `Dataset.groupByKey(..)` to the shared interface. I forgot to add in the previous PR.\n\n### Why are the changes needed?\nThe shared interface needs to support all functionality.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #48147 from hvanhovell/SPARK-49422-follow-up.\n\nAuthored-by: Herman van Hovell \nSigned-off-by: Herman van Hovell ","shortMessageHtmlLink":"[SPARK-49422][CONNECT][SQL] Add groupByKey to sql/api"}},{"before":"db8010b4c8be6f1c50f35cbde3efa44cd5d45adf","after":"3b34891e5b9c2694b7ffdc265290e25847dc3437","ref":"refs/heads/master","pushedAt":"2024-09-19T00:10:54.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-49684][CONNECT] Remove global locks from session and execution managers\n\n### What changes were proposed in this pull request?\n\nEliminate the use of global locks in the session and execution managers. Those locks residing in the streaming query manager cannot be easily removed because the tag and query maps seemingly need to be synchronised.\n\n### Why are the changes needed?\n\nIn order to achieve true scalability.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #48131 from changgyoopark-db/SPARK-49684.\n\nAuthored-by: Changgyoo Park \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-49684][CONNECT] Remove global locks from session and execution…"}},{"before":"5c48806a2941070e23a81b4e7e4f3225fe341535","after":"db8010b4c8be6f1c50f35cbde3efa44cd5d45adf","ref":"refs/heads/master","pushedAt":"2024-09-19T00:10:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"[SPARK-49568][CONNECT][SQL] Remove self type from Dataset\n\n### What changes were proposed in this pull request?\nThis PR removes the self type parameter from Dataset. This turned out to be a bit noisy. The self type is replaced by a combination of covariant return types and abstract types. Abstract types are used when a method takes a Dataset (or a KeyValueGroupedDataset) as an argument.\n\n### Why are the changes needed?\nThe self type made using the classes in sql/api a bit noisy.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #48146 from hvanhovell/SPARK-49568.\n\nAuthored-by: Herman van Hovell \nSigned-off-by: Herman van Hovell ","shortMessageHtmlLink":"[SPARK-49568][CONNECT][SQL] Remove self type from Dataset"}},{"before":"669e63a34012404d8d864cd6294f799b672f6f9a","after":"5c48806a2941070e23a81b4e7e4f3225fe341535","ref":"refs/heads/master","pushedAt":"2024-09-19T00:09:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-49688][CONNECT][TESTS] Fix a sporadic `SparkConnectServiceSuite` failure\n\n### What changes were proposed in this pull request?\n\nAdd a short wait loop to ensure that the test pre-condition is met. To be specific, VerifyEvents.executeHolder is set asynchronously by MockSparkListener.onOtherEvent whereas the test assumes that VerifyEvents.executeHolder is always available.\n\n### Why are the changes needed?\n\nFor smoother development experience.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nSparkConnectServiceSuite.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #48142 from changgyoopark-db/SPARK-49688.\n\nAuthored-by: Changgyoo Park \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-49688][CONNECT][TESTS] Fix a sporadic `SparkConnectServiceSuit…"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEu80DZwA","startCursor":null,"endCursor":null}},"title":"Activity · apache/spark"}