Skip to content

Latest commit

 

History

History
585 lines (574 loc) · 64.9 KB

CHANGELOG.md

File metadata and controls

585 lines (574 loc) · 64.9 KB

Change log

Generated on 2024-02-17

Release 24.02

Features

#9926 [FEA] Add config option for the parquet reader input read limit.
#10270 [FEA] Add support for single quotes when reading JSON
#10253 [FEA] Enable mixed types as string in GpuJsonToStruct
#9692 [FEA] Remove Pascal support
#8806 [FEA] Support lazy quantifier and specified group index in regexp_extract function
#10079 [FEA] Add string parameter support for unix_timestamp for non-UTC time zones
#9667 [FEA][JSON] Add support for non default dateFormat in from_json
#9173 [FEA] Support format_number
#10145 [FEA] Support to_utc_timestamp
#9927 [FEA] Support to_date with non-UTC timezones without DST
#10006 [FEA] Support ParseToTimestamp for non-UTC time zones
#9096 [FEA] Add Spark 3.3.4 support
#9585 [FEA] support ascii function
#9260 [FEA] Create Spark 3.4.2 shim and build env
#10076 [FEA] Add performance test framework for non-UTC time zone features.
#9881 [TASK] Remove spark.rapids.sql.nonUTC.enabled configuration option
#9801 [FEA] Support DateFormat on GPU with a non-UTC timezone
#6834 [FEA] Support GpuHour expression for timezones other than UTC
#6842 [FEA] Support TimeZone aware operations for value extraction
#1860 [FEA] Optimize row based window operations for BOUNDED ranges
#9606 [FEA] Support unix_timestamp with CST(China Time Zone) support
#9815 [FEA] Support unix_timestamp for non-DST timezones
#8807 [FEA] support ‘yyyyMMdd’ format in from_unixtime function
#9605 [FEA] Support from_unixtime with CST(China Time Zone) support
#6836 [FEA] Support FromUnixTime for non UTC timezones
#9175 [FEA] Support Databricks 13.3
#6881 [FEA] Support RAPIDS Spark plugin on ARM
#9274 [FEA] Regular deploy process to include arm artifacts
#9844 [FEA] Let Gpu arrow python runners support writing one batch one time for the single threaded model.
#7309 [FEA] Detect multiple versions of the RAPIDS jar on the classpath at the same time

Performance

#9442 [FEA] For hash joins where the build side can change use the smaller table for the build side
#10142 [TASK] Benchmark existing timestamp functions that work in non-UTC time zone (non-DST)

Bugs Fixed

#9974 [BUG] host memory Leak in MultiFileCoalescingPartitionReaderBase in UTC time zone
#10359 [BUG] Build failure on Databricks nightly run with GpuMapInPandasExecMeta
#10327 [BUG] Unit test FAILED against : SPARK-24957: average with decimal followed by aggregation returning wrong result
#10324 [BUG] hash_aggregate_test.py test FAILED: Type conversion is not allowed from Table {...}
#10291 [BUG] SIGSEGV in libucp.so
#9212 [BUG] from_json fails with cuDF error Invalid list size computation error
#10264 [BUG] hash aggregate test failures due to type conversion errors
#10262 [BUG] Test "SPARK-24957: average with decimal followed by aggregation returning wrong result" failed.
#9353 [BUG] [JSON] A mix of lists and structs within the same column is not supported
#10099 [BUG] orc_test.py::test_orc_scan_with_aggregate_pushdown fails with a standalone cluster on spark 3.3.0
#10047 [BUG] CudfException during conditional hash join while running nds query64
#9779 [BUG] 330cdh failed test_hash_reduction_sum_full_decimal on CI
#10197 [BUG] Disable GetJsonObject by default and update docs
#10165 [BUG] Databricks 13.3 executor side broadcast failure
#10224 [BUG] DBR builds fails when installing Maven
#10222 [BUG] to_utc_timestamp and from_utc_timestamp fallback when TZ is supported time zone
#10195 [BUG] test_window_aggs_for_negative_rows_partitioned failure in CI
#10182 [BUG] test_dpp_bypass / test_dpp_via_aggregate_subquery failures in CI (databricks)
#10169 [BUG] Host column vector leaks when running test_cast_timestamp_to_date
#10050 [BUG] test_cast_decimal_to_decimal[to:DecimalType(1,-1)-from:Decimal(5,-3)] fails with DATAGEN_SEED=1702439569
#10088 [BUG] GpuExplode single row split to fit cuDF limits
#10174 [BUG] json_test.py::test_from_json_struct_timestamp failed on: Part of the plan is not columnar
#10186 [BUG] test_to_date_with_window_functions failed in non-UTC nightly CI
#10154 [BUG] 'spark-test.sh' integration tests FAILED on 'ps: command not found" in Rocky Docker environment
#10175 [BUG] string_test.py::test_format_number_float_special FAILED : AssertionError 'NaN' ==
#10166 Detect Undeclared Shim in POM.xml
#10170 [BUG] test_cast_timestamp_to_date fails with TZ=Asia/Hebron
#10149 [BUG] GPU illegal access detected during delta_byte_array.parquet read
#9905 [BUG] GpuJsonScan incorrect behavior when parsing dates
#10163 Spark 3.3.4 Shim Build Failure
#10105 [BUG] scala:compile is not thread safe unless compiler bridge already exists
#10026 [BUG] test_hash_agg_with_nan_keys failed with a DATAGEN_SEED=1702335559
#10075 [BUG] non-pinned blocking alloc with spill unit test failed in HostAllocSuite
#10134 [BUG] test_window_aggs_for_batched_finite_row_windows_partitioned failed on Scala 2.13 with DATAGEN_SEED=1704033145
#10118 [BUG] non-UTC Nightly CI failed
#10136 [BUG] The canonicalized version of GpuFileSourceScanExecs that suppose to be semantic-equal can be different
#10110 [BUG] disable collect_list and collect_set for window operations by default.
#10129 [BUG] Unit test suite fails with Null data pointer in GpuTimeZoneDB
#10089 [BUG] DATAGEN_SEED= environment does not override the marker datagen_overrides
#10108 [BUG] @datagen_overrides seed is sticky when it shouldn't be
#10064 [BUG] test_unsupported_fallback_regexp_replace failed with DATAGEN_SEED=1702662063
#10117 [BUG] test_from_utc_timestamp failed on Cloudera Env when TZ is Iran
#9914 [BUG] Report GPU OOM on recent passed CI premerges.
#10094 [BUG] spark351 PR check failure MockTaskContext method isFailed in class TaskContext of type ()Boolean is not defined
#10017 [BUG] test_casting_from_double_to_timestamp failed for DATAGEN_SEED=1702329497
#9992 [BUG] conditionals_test.py::test_conditional_with_side_effects_cast[String] failed with DATAGEN_SEED=1701976979
#9743 [BUG][AUDIT] SPARK-45652 - SPJ: Handle empty input partitions after dynamic filtering
#9859 [AUDIT] [SPARK-45786] Inaccurate Decimal multiplication and division results
#9555 [BUG] Scala 2.13 build with JDK 11 or 17 fails OpcodeSuite tests
#10073 [BUG] test_csv_prefer_date_with_infer_schema failed with DATAGEN_SEED=1702847907
#10004 [BUG] If a host memory buffer is spilled, it cannot be unspilled
#10063 [BUG] CI build failure with 341db: method getKillReason has weaker access privileges; it should be public
#10055 [BUG] array_test.py::test_array_transform_non_deterministic failed with non-UTC time zone
#10056 [BUG] Unit tests ToPrettyStringSuite FAILED on spark-3.5.0
#10048 [BUG] Fix out of range error from pySpark in test_timestamp_millis and other two integration test cases
#4204 casting double to string does not match Spark
#9938 Better to do some refactor for the Python UDF code
#10018 [BUG] GpuToUnixTimestampImproved off by 1 on GPU when handling timestamp before epoch
#10012 [BUG] test_str_to_map_expr_random_delimiters with DATAGEN_SEED=1702166057 hangs
#10029 [BUG] doc links fail with 404 for shims.md
#9472 [BUG] Non-Deterministic expressions in an array_transform can cause errors
#9884 [BUG] delta_lake_delete_test.py failed assertion [DATAGEN_SEED=1701225104, IGNORE_ORDER...
#9977 [BUG] test_cast_date_integral fails on databricks 3.4.1
#9936 [BUG] Nightly CI of non-UTC time zone reports 'year 0 is out of range' error
#9941 [BUG] A potential data corruption in Pandas UDFs
#9897 [BUG] Error message for multiple jars on classpath is wrong
#9916 [BUG] test_cast_string_ts_valid_format failed at seed = 1701362564
#9559 [BUG] precommit regularly fails with error trying to download a dependency
#9708 [BUG] test_cast_string_ts_valid_format fails with DATAGEN_SEED=1699978422

PRs

#10439 Reverts NVIDIA#10232 and fixes the plugin build on Databricks 11.3
#10380 Init changelog 24.02 [skip ci]
#10367 Update rapids JNI and private version to release 24.02.0
#10414 [DOC] Fix 24.02.0 documentation errors [skip ci]
#10403 Cherry-pick: Fix a memory leak in json tuple (#10360)
#10387 [DOC] Update docs for 24.02.0 release [skip ci]
#10399 Update NOTICE-binary
#10389 Change version and branch to 24.02 in docs [skip ci]
#10309 [DOC] add custom 404 page and fix some document issue [skip ci]
#10352 xfail mixed type test
#10355 Revert "Support barrier mode for mapInPandas/mapInArrow (#10343)"
#10353 Use fixed seed for test_from_json_struct_decimal
#10343 Support barrier mode for mapInPandas/mapInArrow
#10345 Fix auto merge conflict 10339 [skip ci]
#9991 Start to use explicit memory limits in the parquet chunked reader
#10328 Fix typo in spark-tests.sh [skip ci]
#10279 Run '--packages' only with default cuda11 jar
#10273 Support reading JSON data with single quotes around attribute names and values
#10306 Fix performance regression in from_json
#10272 Add FullOuter support to GpuShuffledSymmetricHashJoinExec
#10260 Add perf test for time zone operators
#10275 Add tests for window Python udf with array input
#10278 Clean up $M2_CACHE to avoid side-effect of previous dependency:get [skip ci]
#10268 Add config to enable mixed types as string in GpuJsonToStruct & GpuJsonScan
#10297 Revert "UCX 1.16.0 upgrade (#10190)"
#10289 Add gerashegalov to CODEOWNERS [skip ci]
#10290 Fix merge conflict with 23.12 [skip ci]
#10190 UCX 1.16.0 upgrade
#10211 Use parse_url kernel for QUERY literal and column key
#10267 Update to libcudf unsigned sum aggregation types change
#10208 Added Support for Lazy Quantifier
#9993 Enable mixed types as string in GpuJsonScan
#10246 Refactor full join iterator to allow access to build tracker
#10257 Enable auto-merge from branch-24.02 to branch-24.04 [skip CI]
#10178 Mark hash reduction decimal overflow test as a permanent seed override
#10244 Use POSIX mode in assembly plugin to avoid issues with large UID/GID
#10238 Smoke test with '--package' to fetch the plugin jar
#10201 Deploy release candidates to local maven repo for dependency check[skip ci]
#10240 Improved inner joins with large build side
#10220 Disable GetJsonObject by default and add tests for as many issues with it as possible
#10230 Fix Databricks 13.3 BroadcastHashJoin using executor side broadcast fed by ColumnarToRow [Databricks]
#10232 Fixed 330db Shims to Adopt the PythonRunner Changes
#10225 Download Maven from apache.org archives [skip ci]
#10210 Add string parameter support for unix_timestamp for non-UTC time zones
#10223 Fix to_utc_timestamp and from_utc_timestamp fallback when TZ is supported time zone
#10205 Deterministic ordering in window tests
#10204 Further prevent degenerative joins in dpp_test
#10156 Update string to float compatibility doc[skip ci]
#10193 Fix explode with carry-along columns on GpuExplode single row retry handling
#10191 Updating the config documentation for filecache configs [skip ci]
#10131 With a single row GpuExplode tries to split the generator array
#10179 Fix build regression against Spark 3.2.x
#10189 test needs marks for non-UTC and for non_supported timezones
#10176 Fix format_number NaN symbol in high jdk version
#10074 Update the legacy mode check: only take effect when reading date/timestamp column
#10167 Defined Shims Should Be Declared In POM
#10168 Prevent a degenerative join in test_dpp_reuse_broadcast_exchange
#10171 Fix test_cast_timestamp_to_date when running in a DST time zone
#9975 Improve dateFormat support in GpuJsonScan and make tests consistent with GpuStructsToJson
#9790 Support float case of format_number with format_float kernel
#10144 Support to_utc_timestamp
#10162 Fix Spark 334 Build
#10146 Refactor the window code so it is not mostly kept in a few very large files
#10155 Install procps tools for rocky docker images [skip ci]
#10153 Disable multi-threaded Maven
#10100 Enable to_date (via gettimestamp and casting timestamp to date) for non-UTC time zones
#10140 Removed Unnecessary Whitespaces From Spark 3.3.4 Shim [skip ci]
#10148 fix test_hash_agg_with_nan_keys floating point sum failure
#10150 Increase timeouts in HostAllocSuite to avoid timeout failures on slow machines
#10143 Fix test_window_aggs_for_batched_finite_row_windows_partitioned fail
#9887 Reduce time-consuming of pre-merge
#10130 Change unit tests that force ooms to specify the oom type (gpu
#10138 Update copyright dates in NOTICE files [skip ci]
#10139 Add Delta Lake 2.3.0 to list of versions to test for Spark 3.3.x
#10135 Fix CI: can't find script when there is pushd in script [skip ci]
#10137 Fix the canonicalizing for GPU file scan
#10132 Disable collect_list and collect_set for window by default
#10084 Refactor GpuJsonToStruct to reduce code duplication and manage resources more efficiently
#10087 Additional unit tests for GeneratedInternalRowToCudfRowIterator
#10082 Add Spark 3.3.4 Shim
#10054 Support Ascii function for ascii and latin-1
#10127 Fix merge conflict with branch-23.12
#10097 [DOC] Update docs for 23.12.1 release [skip ci]
#10109 Fixes a bug where datagen seed overrides were sticky and adds datagen_seed_override_disabled
#10093 Fix test_unsupported_fallback_regexp_replace
#10119 Fix from_utc_timestamp case failure on Cloudera when TZ is Iran
#10106 Add isFailed() to MockTaskContext and Remove MockTaskContextBase.scala
#10112 Remove datagen seed override for test_conditional_with_side_effects_cast
#10104 [DOC] Add in docs about memory debugging [skip ci]
#9925 Use threads, cache Scala compiler in GH mvn workflow
#9967 Added Spark-3.4.2 Shims
#10061 Use parse_url kernel for QUERY parsing
#10101 [DOC] Add column order error docs [skip ci]
#10078 Add perf test for non-UTC operators
#10096 Shim MockTaskContext to fix Spark 3.5.1 build
#10092 Implement Math.round using floor on GPU
#10085 Update tests that originally restricted the Spark timestamp range
#10090 Replace GPU-unsupported \z with an alternative RLIKE expression
#10095 Temporarily fix date format failed cases for non-UTC time zone.
#9999 Add some odd time zones for timezone transition tests
#9962 Add 3.5.1-SNAPSHOT Shim
#10071 Cleanup usage of non-utc configuration here
#10057 Add support for StringConcatFactory.makeConcatWithConstants (#9555)
#9996 Test full timestamp output range in PySpark
#10081 Add a fallback Cloudera Maven repo URL [skip ci]
#10065 Improve host memory spill interfaces
#10070 Fix 332db build failure
#10060 Fix failed cases for non-utc time zone
#10038 Remove spark.rapids.sql.nonUTC.enabled configuration option
#10059 Fixed Failing ToPrettyStringSuite Test for 3.5.0
#10013 Extended configuration of OOM injection mode
#10052 Set seed=0 for some integration test cases
#10053 Remove invalid user from CODEOWNER file [skip ci]
#10049 Fix out of range error from pySpark in test_timestamp_millis and other two integration test cases
#9721 Support date_format via Gpu for non-UTC time zone
#9845 Use parse_url kernel for HOST parsing
#10024 Support hour minute second for non-UTC time zone
#9973 Batching support for row-based bounded window functions
#10042 Update tests to not have hard coded fallback when not needed
#9816 Support unix_timestamp and to_unix_timestamp with non-UTC timezones (non-DST)
#9902 Some refactor for the Python UDF code
#10023 GPU supports yyyyMMdd format by post process for the from_unixtime function
#10033 Remove GpuToTimestampImproved and spark.rapids.sql.improvedTimeOps.enabled
#10016 Fix infinite loop in test_str_to_map_expr_random_delimiters
#10030 Update links in shims.md
#10015 Fix array_transform to not recompute the argument
#10011 Add cpu oom retry split handling to InternalRowToColumnarBatchIterator
#10019 Fix auto merge conflict 10010 [skip ci]
#9760 Support split broadcast join condition into ast and non-ast
#9827 Enable ORC timestamp and decimal predicate push down tests
#10002 Use Spark 3.3.3 instead of 3.3.2 for Scala 2.13 premerge builds
#10000 Optimize from_unixtime
#10003 Fix merge conflict with branch-23.12
#9984 Fix 340+(including DB341+) does not support casting date to integral/float
#9972 Fix year 0 is out of range in test_from_json_struct_timestamp
#9814 Support from_unixtime via Gpu for non-UTC time zone
#9929 Add host memory retries for GeneratedInternalRowToCudfRowIterator
#9957 Update cases for cast between integral and (date/time)
#9959 Append new authorized user to blossom-ci whitelist [skip ci]
#9942 Fix a potential data corruption for Pandas UDF
#9922 Fix allowMultipleJars recommend setting message
#9947 Fix merge conflict with branch-23.12
#9908 Register default allocator for host memory
#9944 Fix Java OOM caused by incorrect state of shouldCapture when exception occurred
#9937 Refactor to use CLASSIFIER instead of CUDA_CLASSIFIER [skip ci]
#9904 Params for build and test CI scripts on Databricks
#9719 Support fine grained timezone checker instead of type based
#9918 Prevent generation of 'year 0 is out of range' strings in IT
#9852 Avoid generating duplicate nan keys with MapGen(FloatGen)
#9674 Add cache action to speed up mvn workflow [skip ci]
#9900 Revert "Remove Databricks 13.3 from release 23.12 (#9890)"
#9888 Update nightly build and deploy script for arm artifacts [skip ci]
#9656 Update for new retry state machine JNI APIs
#9654 Detect multiple jars on the classpath when init plugin
#9857 Skip redundant steps in nightly build [skip ci]
#9812 Update JNI and private dep version to 24.02.0-SNAPSHOT

Release 23.12

Features

#6832 [FEA] Convert Timestamp/Timezone tests/checks to be per operator instead of generic
#9805 [FEA] Support current_date expression function with CST (UTC + 8) timezone support
#9515 [FEA] Support temporal types in to_json
#9872 [FEA][JSON] Support Decimal type in to_json
#9802 [FEA] Support FromUTCTimestamp on the GPU with a non-UTC time zone
#6831 [FEA] Support timestamp transitions to and from UTC for single time zones with no repeating rules
#9590 [FEA][JSON] Support temporal types in from_json
#9804 [FEA] Support CPU path for from_utc_timestamp function with timezone
#9461 [FEA] Validate nvcomp-3.0 with spark rapids plugin
#8832 [FEA] rewrite join conditions where only part of it can fit on the AST
#9059 [FEA] Support spark.sql.parquet.datetimeRebaseModeInRead=LEGACY
#9037 [FEA] Support spark.sql.parquet.int96RebaseModeInWrite= LEGACY
#9632 [FEA] Take into account org.apache.spark.timeZone in Parquet/Avro from Spark 3.2
#8770 [FEA] add more metrics to Eventlogs or Executor logs
#9597 [FEA][JSON] Support boolean type in from_json
#9516 [FEA] Add support for JSON data source option ignoreNullFields=false in to_json
#9520 [FEA] Add support for LAST() as running window function
#9518 [FEA] Add support for relevant JSON data source options in to_json
#9218 [FEA] Support stack function
#9532 [FEA] Support Delta Lake 2.3.0
#1525 [FEA] Support Scala 2.13
#7279 [FEA] Support OverwriteByExpressionExecV1 for Delta Lake
#9326 [FEA] Specify recover_with_null when reading JSON files
#8780 [FEA] Support to_json function
#7278 [FEA] Support AppendDataExecV1 for Delta Lake
#6266 [FEA] Support Percentile
#7277 [FEA] Support AtomicReplaceTableAsSelect for Delta Lake
#7276 [FEA] Support AtomicCreateTableAsSelect for Delta Lake

Performance

#8137 [FEA] Upgrade to UCX 1.15
#8157 [FEA] Add string comparison to AST expressions
#9398 [FEA] Compress/encrypt spill to disk

Bugs Fixed

#9687 [BUG] test_in_set fails when DATAGEN_SEED=1698940723
#9659 [BUG] executor crash intermittantly in scala2.13-built spark332 integration tests
#9923 [BUG] Failed case about test_timestamp_seconds_rounding_necessary[Decimal(20,7)][DATAGEN_SEED=1701412018] – src.main.python.date_time_test
#9982 [BUG] test "convert large InternalRow iterator to cached batch single col" failed with arena pool
#9683 [BUG] test_map_scalars_supported_key_types fails with DATAGEN_SEED=1698940723
#9976 [BUG] test_part_write_round_trip[Float] Failed on -0.0 partition
#9948 [BUG] parquet reader data corruption in nested schema after rapidsai/cudf#13302
#9867 [BUG] Unable to use Spark Rapids with Spark Thrift Server
#9934 [BUG] test_delta_multi_part_write_round_trip_unmanaged and test_delta_part_write_round_trip_unmanaged failed DATA_SEED=1701608331
#9933 [BUG] collection_ops_test.py::test_sequence_too_long_sequence[Long(not_null)][DATAGEN_SEED=1701553915, INJECT_OOM]
#9837 [BUG] test_part_write_round_trip failed
#9932 [BUG] Failed test_multi_tier_ast[DATAGEN_SEED=1701445668] on CI
#9829 [BUG] Java OOM when testing non-UTC time zone with lots of cases fallback.
#9403 [BUG] test_cogroup_apply_udf[Short(not_null)] failed with pandas 2.1.X
#9684 [BUG] test_coalesce fails with DATAGEN_SEED=1698940723
#9685 [BUG] test_case_when fails with DATAGEN_SEED=1698940723
#9776 [BUG] fastparquet compatibility tests fail with data mismatch if TZ is not set and system timezone is not UTC
#9733 [BUG] Complex AST expressions can crash with non-matching operand type error
#9877 [BUG] Fix resource leak in to_json
#9722 [BUG] test_floor_scale_zero fails with DATAGEN_SEED=1700009407
#9846 [BUG] test_ceil_scale_zero may fail with different datagen_seed
#9781 [BUG] test_cast_string_date_valid_format fails on DATAGEN_SEED=1700250017
#9714 Scala Map class not found when executing the benchmark on Spark 3.5.0 with Scala 2.13
#9856 collection_ops_test.py failed on Dataproc-2.1 with: Column 'None' does not exist
#9397 [BUG] RapidsShuffleManager MULTITHREADED on Databricks, we see loss of executors due to Rpc issues
#9738 [BUG] test_delta_part_write_round_trip_unmanaged and test_delta_multi_part_write_round_trip_unmanaged fail with DATAGEN_SEED=1700105176
#9771 [BUG] ast_test.py::test_X[(String, True)][DATAGEN_SEED=1700205785] failed
#9782 [BUG] Error messages appear in a clean build
#9798 [BUG] GpuCheckOverflowInTableInsert should be added to databricks shim
#9820 [BUG] test_parquet_write_roundtrip_datetime_with_legacy_rebase fails with "year 0 is out of range"
#9817 [BUG] FAILED dpp_test.py::test_dpp_reuse_broadcast_exchange[false-0-parquet][DATAGEN_SEED=1700572856, IGNORE_ORDER]
#9768 [BUG] cast decimal to string ScalaTest relies on a side effects
#9711 [BUG] test_lte fails with DATAGEN_SEED=1699987762
#9751 [BUG] cmp_test test_gte failed with DATAGEN_SEED=1700149611
#9469 [BUG] [main] ERROR com.nvidia.spark.rapids.GpuOverrideUtil - Encountered an exception applying GPU overrides java.lang.IllegalStateException: the broadcast must be on the GPU too
#9648 [BUG] Existence default values in schema are not being honored
#9676 Fix Delta Lake Integration tests; test_delta_atomic_create_table_as_select and test_delta_atomic_replace_table_as_select
#9701 [BUG] test_ts_formats_round_trip and test_datetime_roundtrip_with_legacy_rebase fail with DATAGEN_SEED=1699915317
#9691 [BUG] Repeated Maven invocations w/o changes recompile too many Scala sources despite recompileMode=incremental
#9547 Update buildall and doc to generate bloop projects for test debugging
#9697 [BUG] Iceberg multiple file readers can not read files if the file paths contain encoded URL unsafe chars
#9681 Databricks Build Failing For 330db+
#9521 [BUG] Multi Threaded Shuffle Writer needs flow control
#9675 Failing Delta Lake Tests for Databricks 13.3 Due to WriteIntoDeltaCommand
#9669 [BUG] Rebase exception states not in UTC but timezone is Etc/UTC
#7940 [BUG] UCX peer connection issue in multi-nic single node cluster
#9650 [BUG] Github workflow for missing scala2.13 updates fails to detect when pom is new
#9621 [BUG] Scala 2.13 with-classifier profile is picking up Scala2.12 spark.version
#9636 [BUG] All parquet integration tests failed "Part of the plan is not columnar class" in databricks runtimes
#9108 [BUG] nullability on some decimal operations is wrong
#9625 [BUG] Typo in github Maven check install-modules
#9603 [BUG] fastparquet_compatibility_test fails on dataproc
#8729 [BUG] nightly integration test failed OOM kill in JDK11 ENV
#9589 [BUG] Scala 2.13 build hard-codes Java 8 target
#9581 Delta Lake 2.4 missing equals/hashCode override for file format and some metrics for merge
#9507 [BUG] Spark 3.2+/ParquetFilterSuite/Parquet filter pushdown - timestamp/ FAILED
#9540 [BUG] Job failed with SparkUpgradeException no matter which value are set for spark.sql.parquet.datetimeRebaseModeInRead
#9545 [BUG] Dataproc 2.0 test_reading_file_rewritten_with_fastparquet tests failing
#9552 [BUG] Inconsistent CDH dependency overrides across submodules
#9571 [BUG] non-deterministic compiled SQLExecPlugin.class with scala 2.13 deployment
#9569 [BUG] test_window_running failed in 3.1.2+3.1.3
#9480 [BUG] mapInPandas doesn't invoke udf on empty partitions
#8644 [BUG] Parquet file with malformed dictionary does not error when loaded
#9310 [BUG] Improve support for reading JSON files with malformed rows
#9457 [BUG] CDH 332 unit tests failing
#9404 [BUG] Spark reports a decimal error when create lit scalar when generate Decimal(34, -5) data.
#9110 [BUG] GPU Reader fails due to partition column creating column larger then cudf column size limit
#8631 [BUG] Parquet load failure on repeated_no_annotation.parquet
#9364 [BUG] CUDA illegal access error is triggering split and retry logic

PRs

#10384 [DOC] Update docs for 23.12.2 release [skip ci]
#10341 Update changelog for v23.12.2 [skip ci]
#10340 Copyright to 2024 [skip ci]
#10323 Upgrade version to 23.12.2-SNAPSHOT
#10329 update download page for v23.12.2 release [skip ci]
#10274 PythonRunner Changes
#10124 Update changelog for v23.12.1 [skip ci]
#10123 Change version to v23.12.1 [skip ci]
#10122 Init changelog for v23.12.1 [skip ci]
#10121 [DOC] update download page for db hot fix [skip ci]
#10116 Upgrade to 23.12.1-SNAPSHOT
#10069 Revert "Support split broadcast join condition into ast and non-ast […
#9470 Use float to string kernel
#9481 Use parse_url kernel for PROTOCOL parsing
#9935 Init 23.12 changelog [skip ci]
#9943 [DOC] Update docs for 23.12.0 release [skip ci]
#10014 Add documentation for how to run tests with a fixed datagen seed [skip ci]
#9954 Update private and JNI version to released 23.12.0
#10009 Using fix seed to unblock 23.12 release; Move the blocked issues to 24.02
#10007 Fix Java OOM in non-UTC case with lots of xfail (#9944)
#9985 Avoid allocating GPU memory out of RMM managed pool in test
#9970 Avoid leading and trailing zeros in test_timestamp_seconds_rounding_necessary
#9978 Avoid using floating point values as partition values in tests
#9979 Add compatibility notes for writing ORC with lost Gregorian days [skip ci]
#9949 Override the seed for test_map_scalars_supported_key_types for version of Spark before 3.4.0 [Databricks]
#9961 Avoid using floating point for partition values in Delta Lake tests
#9960 Fix LongGen accidentally using special cases when none are desired
#9950 Avoid generating NaNs as partition values in test_part_write_round_trip
#9940 Fix 'year 0 is out of range' by setting a fix seed
#9946 Fix test_multi_tier_ast to ignore ordering of output rows
#9928 Test inset with NaN only for Spark from 3.1.3
#9906 Fix test_initcap to use the intended limited character set
#9831 Skip fastparquet timestamp tests when plugin cannot read/write timestamps
#9893 Add multiple expression tier regression test for AST
#9889 Fix test_cast_string_ts_valid_format test
#9833 Fix a hang for Pandas UDFs on DB 13.3
#9873 Add support for decimal in to_json
#9890 Remove Databricks 13.3 from release 23.12
#9874 Fix zero-scale floor and ceil tests
#9879 Fix resource leak in to_json
#9600 Add date and timestamp support to to_json
#9871 Fix test_cast_string_date_valid_format generating year 0
#9885 Preparation for non-UTC nightly CI [skip ci]
#9810 Support from_utc_timestamp on the GPU for non-UTC timezones (non-DST)
#9865 Fix problems with nulls in sequence tests
#9864 Add compatibility documentation with respect to decimal overflow detection [skip ci]
#9860 Fixing FAQ deadlink in plugin code [skip ci]
#9840 Avoid using NaNs as Delta Lake partition values
#9773 xfail all the impacted cases when using non-UTC time zone
#9849 Instantly Delete pre-merge content of stage workspace if success
#9848 Force datagen_seed for test_ceil_scale_zero and test_decimal_round
#9677 Enable build for Databricks 13.3
#9809 Re-enable AST string integration cases
#9835 Avoid pre-Gregorian dates in schema_evolution_test
#9786 Check paths for existence to prevent ignorable error messages during build
#9824 UCX 1.15 upgrade
#9800 Add GpuCheckOverflowInTableInsert to Databricks 11.3+
#9821 Update timestamp gens to avoid "year 0 is out of range" errors
#9826 Set seed to 0 for test_hash_reduction_sum
#9720 Support timestamp in from_json
#9818 Specify nullable=False when generating filter values in dpp tests
#9689 Support CPU path for from_utc_timestamp function with timezone
#9769 Use withGpuSparkSession to customize SparkConf
#9780 Fix NaN handling in GpuLessThanOrEqual and GpuGreaterThanOrEqual
#9795 xfail AST string tests
#9666 Add support for parsing strings as dates in from_json
#9673 Fix the broadcast joins issues caused by InputFileBlockRule
#9785 Force datagen_seed for 9781 and 9784 [skip ci]
#9765 Let GPU scans fall back when default values exist in schema
#9729 Fix Delta Lake atomic table operations on spark341db
#9770 [BUG] Fix the doc for Maven and Scala 2.13 test example [skip ci]
#9761 Fix bug in tagging of JsonToStructs
#9758 Remove forced seed from Delta Lake part_write_round_trip_unmanaged tests
#9652 Add time zone config to set non-UTC
#9736 Fix TimestampGen to generate value not too close to the minimum allowed timestamp
#9698 Speed up build: unnecessary invalidation in the incremental recompile mode
#9748 Fix Delta Lake part_write_round_trip_unmanaged tests with floating point
#9702 Support split BroadcastNestedLoopJoin condition for AST and non-AST
#9746 Force test_hypot to be single seed for now
#9745 Avoid generating null filter values in test_delta_dfp_reuse_broadcast_exchange
#9741 Set seed=0 for the delta lake part roundtrip tests
#9660 Fully support date/time legacy rebase for nested input
#9672 Support String type for AST
#9716 Initiate project version 24.02.0-SNAPSHOT
#9732 Temporarily force datagen_seed=0 for test_re_replace_all to unblock CI
#9726 Fix leak in BatchWithPartitionData
#9717 Encode the file path from Iceberg when converting to a PartitionedFile
#9441 Add a random seed specific to datagen cases
#9649 Support spark.sql.parquet.datetimeRebaseModeInRead=LEGACY and spark.sql.parquet.int96RebaseModeInRead=LEGACY
#9612 Escape quotes and newlines when converting strings to json format in to_json
#9644 Add Partial Delta Lake Support for Databricks 13.3
#9690 Changed extractExecutedPlan to consider ResultQueryStageExec for Databricks 13.3
#9686 Removed Maven Profiles From tests/pom.xml
#9509 Fine-grained spill metrics
#9658 Support spark.sql.parquet.int96RebaseModeInWrite=LEGACY
#9695 Revert "Support split non-AST-able join condition for BroadcastNested…
#9693 Enable automerge from 23.12 to 24.02 [skip ci]
#9679 [Doc] update the dead link in download page [skip ci]
#9678 Add flow control for multithreaded shuffle writer
#9635 Support split non-AST-able join condition for BroadcastNestedLoopJoin
#9646 Fix Integration Test Failures for Databricks 13.3 Support
#9670 Normalize file timezone and handle missing file timezone in datetimeRebaseUtils
#9657 Update verify check to handle new pom files [skip ci]
#9663 Making User Guide info in bold and adding it as top right link in github.io [skip ci]
#9609 Add valid retry solution to mvn-verify [skip ci]
#9655 Document problem with handling of invalid characters in CSV reader
#9620 Add support for parsing boolean values in from_json
#9615 Bloop updates - require JDK11 in buildall + docs, build bloop for all targets.
#9631 Refactor Parquet readers
#9637 Added Support For Various Execs for Databricks 13.3
#9640 Add support for ignoreNullFields=false in to_json
#9623 Running window optimization for LAST()
#9641 Revert "Support rebase checking for nested dates and timestamps (#9617)"
#9423 Re-enable from_json / JsonToStructs
#9624 Add jenkins-level retry for pre-merge build in databricks runtimes
#9608 Fix nullability issues for some decimal operations
#9617 Support rebase checking for nested dates and timestamps
#9611 Move simple classes after refactoring to sql-plugin-api
#9618 Remove unused dataTypes argument from HostShuffleCoalesceIterator
#9626 Fix ENV typo in pre-merge github actions [skip ci]
#9593 PythonRunner and RapidsErrorUtils Changes For Databricks 13.3
#9607 Integration tests: Install specific fastparquet version.
#9610 Propagate local properties to broadcast execs
#9544 Support batching for RANGE running window aggregations. Including on
#9601 Remove usage of deprecated scala.Proxy
#9591 Enable implicit JDK profile activation
#9586 Merge metrics and file format fixes to Delta 2.4 support
#9594 Revert "Ignore failing Parquet filter test to unblock CI (#9519)"
#9454 Support encryption and compression in disk store
#9439 Support stack function
#9583 Fix fastparquet tests to work with HDFS
#9508 Consolidate deps switching in an intermediate pom
#9562 Delta Lake 2.3.0 support
#9576 Move Stack classes to wrapper classes to fix non-deterministic build issue
#9572 Add retry for CrossJoinIterator and ConditionalNestedLoopJoinIterator
#9575 Fix test_window_running*() for NTH_VALUE IGNORE NULLS.
#9574 Fix broken #endif scala comments [skip ci]
#9568 Enforce Apache 3.3.0+ for Scala 2.13
#9557 Support launching Map Pandas UDF on empty partitions
#9489 Batching support for ROW-based FIRST() window function
#9510 Add Databricks 13.3 shim boilerplate code and refactor Databricks 12.2 shim
#9554 Fix fastparquet installation for
#9536 Add CPU POC of TimeZoneDB; Test some time zones by comparing CPU POC and Spark
#9558 Support integration test against scala2.13 spark binaries[skip ci]
#8592 Scala 2.13 Support
#9551 Enable malformed Parquet failure test
#9546 Support OverwriteByExpressionExecV1 for Delta Lake tables
#9527 Support Split And Retry for GpuProjectAstExec
#9541 Move simple classes to API
#9548 Append new authorized user to blossom-ci whitelist [skip ci]
#9418 Fix STRUCT comparison between Pandas and Spark dataframes in fastparquet tests
#9468 Add SplitAndRetry to GpuRunningWindowIterator
#9486 Add partial support for to_json
#9538 Fix tiered project breaking higher order functions
#9539 Add delta-24x to delta-lake/README.md [skip ci]
#9534 Add pyarrow tests for Databricks runtime
#9444 Remove redundant pass-through shuffle manager classes
#9531 Fix relative path for spark-shell nightly test [skip ci]
#9525 Follow-up to dbdeps consolidation
#9506 Move ProxyShuffleInternalManagerBase to api
#9504 Add a spark-shell smoke test to premerge and nightly
#9519 Ignore failing Parquet filter test to unblock CI
#9478 Support AppendDataExecV1 for Delta Lake tables
#9366 Add tests to check compatibility with fastparquet
#9419 Add retry to RoundRobin Partitioner and Range Partitioner
#9502 Install Dependencies Needed For Databricks 13.3
#9296 Implement percentile aggregation
#9488 Add Shim JSON Headers for Databricks 13.3
#9443 Add AtomicReplaceTableAsSelectExec support for Delta Lake
#9476 Refactor common Delta Lake test code
#9463 Fix Cloudera 3.3.2 shim for handling CheckOverflowInTableInsert and orc zstd support
#9460 Update links in old release notes to new doc locations [skip ci]
#9405 Wrap scalar generation into spark session in integration test
#9459 Fix 332cdh build [skip ci]
#9425 Add support for AtomicCreateTableAsSelect with Delta Lake
#9434 Add retry support to HostToGpuCoalesceIterator.concatAllAndPutOnGPU
#9453 Update codeowner and blossom-ci ACL [skip ci]
#9396 Add support for Cloudera CDS-3.3.2
#9380 Fix parsing of Parquet legacy list-of-struct format
#9438 Fix auto merge conflict 9437 [skip ci]
#9424 Refactor aggregate functions
#9414 Add retry to GpuHashJoin.filterNulls
#9388 Add developer documentation about working with data sources [skip ci]
#9369 Improve JSON empty row fix to use less memory
#9373 Fix auto merge conflict 9372
#9308 Initiate arm64 CI support [skip ci]
#9292 Init project version 23.12.0-SNAPSHOT

Older Releases

Changelog of older releases can be found at docs/archives