forked from rapidsai/cudf
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGELOG.md
7262 lines (6911 loc) · 749 KB
/
CHANGELOG.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# cuDF 24.02.00 (12 Feb 2024)
## 🚨 Breaking Changes
- Remove **kwargs from astype ([#14765](https://github.com/rapidsai/cudf/pull/14765)) [@mroeschke](https://github.com/mroeschke)
- Remove mimesis as a testing dependency ([#14723](https://github.com/rapidsai/cudf/pull/14723)) [@mroeschke](https://github.com/mroeschke)
- Update to Dask's `shuffle_method` kwarg ([#14708](https://github.com/rapidsai/cudf/pull/14708)) [@pentschev](https://github.com/pentschev)
- Drop Pascal GPU support. ([#14630](https://github.com/rapidsai/cudf/pull/14630)) [@bdice](https://github.com/bdice)
- Update to CCCL 2.2.0. ([#14576](https://github.com/rapidsai/cudf/pull/14576)) [@bdice](https://github.com/bdice)
- Expunge as_frame conversions in Column algorithms ([#14491](https://github.com/rapidsai/cudf/pull/14491)) [@wence-](https://github.com/wence-)
- Deprecate cudf::make_strings_column accepting typed offsets ([#14461](https://github.com/rapidsai/cudf/pull/14461)) [@davidwendt](https://github.com/davidwendt)
- Remove deprecated nvtext::load_merge_pairs_file ([#14460](https://github.com/rapidsai/cudf/pull/14460)) [@davidwendt](https://github.com/davidwendt)
- Include writer code and writerVersion in ORC files ([#14458](https://github.com/rapidsai/cudf/pull/14458)) [@vuule](https://github.com/vuule)
- Remove null mask for zero nulls in json readers ([#14451](https://github.com/rapidsai/cudf/pull/14451)) [@karthikeyann](https://github.com/karthikeyann)
- REF: Remove **kwargs from to_pandas, raise if nullable is not implemented ([#14438](https://github.com/rapidsai/cudf/pull/14438)) [@mroeschke](https://github.com/mroeschke)
- Consolidate 1D pandas object handling in as_column ([#14394](https://github.com/rapidsai/cudf/pull/14394)) [@mroeschke](https://github.com/mroeschke)
- Move chars column to parent data buffer in strings column ([#14202](https://github.com/rapidsai/cudf/pull/14202)) [@karthikeyann](https://github.com/karthikeyann)
- Switch to scikit-build-core ([#13531](https://github.com/rapidsai/cudf/pull/13531)) [@vyasr](https://github.com/vyasr)
## 🐛 Bug Fixes
- Exclude tests from builds ([#14981](https://github.com/rapidsai/cudf/pull/14981)) [@vyasr](https://github.com/vyasr)
- Fix the bounce buffer size in ORC writer ([#14947](https://github.com/rapidsai/cudf/pull/14947)) [@vuule](https://github.com/vuule)
- Revert sum/product aggregation to always produce `int64_t` type ([#14907](https://github.com/rapidsai/cudf/pull/14907)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Fixed an issue with output chunking computation stemming from input chunking. ([#14889](https://github.com/rapidsai/cudf/pull/14889)) [@nvdbaranec](https://github.com/nvdbaranec)
- Fix total_byte_size in Parquet row group metadata ([#14802](https://github.com/rapidsai/cudf/pull/14802)) [@etseidl](https://github.com/etseidl)
- Fix index difference to follow the pandas format ([#14789](https://github.com/rapidsai/cudf/pull/14789)) [@amiralimi](https://github.com/amiralimi)
- Fix shared-workflows repo name ([#14784](https://github.com/rapidsai/cudf/pull/14784)) [@raydouglass](https://github.com/raydouglass)
- Remove unparseable attributes from all nodes ([#14780](https://github.com/rapidsai/cudf/pull/14780)) [@vyasr](https://github.com/vyasr)
- Refactor and add validation to IntervalIndex.__init__ ([#14778](https://github.com/rapidsai/cudf/pull/14778)) [@mroeschke](https://github.com/mroeschke)
- Work around incompatibilities between V2 page header handling and zStandard compression in Parquet writer ([#14772](https://github.com/rapidsai/cudf/pull/14772)) [@etseidl](https://github.com/etseidl)
- Fix calls to deprecated strings factory API ([#14771](https://github.com/rapidsai/cudf/pull/14771)) [@davidwendt](https://github.com/davidwendt)
- Fix ptx file discovery in editable installs ([#14767](https://github.com/rapidsai/cudf/pull/14767)) [@vyasr](https://github.com/vyasr)
- Revise ``shuffle`` deprecation to align with dask/dask ([#14762](https://github.com/rapidsai/cudf/pull/14762)) [@rjzamora](https://github.com/rjzamora)
- Enable intermediate proxies to be picklable ([#14752](https://github.com/rapidsai/cudf/pull/14752)) [@shwina](https://github.com/shwina)
- Add CUDF_TEST_PROGRAM_MAIN macro to tests lacking it ([#14751](https://github.com/rapidsai/cudf/pull/14751)) [@etseidl](https://github.com/etseidl)
- Fix CMake args ([#14746](https://github.com/rapidsai/cudf/pull/14746)) [@vyasr](https://github.com/vyasr)
- Fix logic bug introduced in #14730 ([#14742](https://github.com/rapidsai/cudf/pull/14742)) [@wence-](https://github.com/wence-)
- [Java] Choose The Correct RoundingMode For Checking Decimal OutOfBounds ([#14731](https://github.com/rapidsai/cudf/pull/14731)) [@razajafri](https://github.com/razajafri)
- Fix ``Groupby.get_group`` ([#14728](https://github.com/rapidsai/cudf/pull/14728)) [@rjzamora](https://github.com/rjzamora)
- Ensure that all CUDA kernels in cudf have hidden visibility. ([#14726](https://github.com/rapidsai/cudf/pull/14726)) [@robertmaynard](https://github.com/robertmaynard)
- Split cuda versions for notebook testing ([#14722](https://github.com/rapidsai/cudf/pull/14722)) [@raydouglass](https://github.com/raydouglass)
- Fix to_numeric not preserving Series index and name ([#14718](https://github.com/rapidsai/cudf/pull/14718)) [@mroeschke](https://github.com/mroeschke)
- Update dask-cudf wheel name ([#14713](https://github.com/rapidsai/cudf/pull/14713)) [@raydouglass](https://github.com/raydouglass)
- Fix strings::contains matching end of string target ([#14711](https://github.com/rapidsai/cudf/pull/14711)) [@davidwendt](https://github.com/davidwendt)
- Update to Dask's `shuffle_method` kwarg ([#14708](https://github.com/rapidsai/cudf/pull/14708)) [@pentschev](https://github.com/pentschev)
- Write file-level statistics when writing ORC files with zero rows ([#14707](https://github.com/rapidsai/cudf/pull/14707)) [@vuule](https://github.com/vuule)
- Potential fix for peformance regression in #14415 ([#14706](https://github.com/rapidsai/cudf/pull/14706)) [@etseidl](https://github.com/etseidl)
- Ensure DataFrame column types are preserved during serialization ([#14705](https://github.com/rapidsai/cudf/pull/14705)) [@mroeschke](https://github.com/mroeschke)
- Skip numba test that fails on ARM ([#14702](https://github.com/rapidsai/cudf/pull/14702)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Allow Z in datetime string parsing in non pandas compat mode ([#14701](https://github.com/rapidsai/cudf/pull/14701)) [@mroeschke](https://github.com/mroeschke)
- Fix nan_as_null not being respected when passing arrow object ([#14688](https://github.com/rapidsai/cudf/pull/14688)) [@mroeschke](https://github.com/mroeschke)
- Fix constructing Series/Index from arrow array and dtype ([#14686](https://github.com/rapidsai/cudf/pull/14686)) [@mroeschke](https://github.com/mroeschke)
- Fix Aggregation Type Promotion: Ensure Unsigned Input Types Result in Unsigned Output for Sum and Multiply ([#14679](https://github.com/rapidsai/cudf/pull/14679)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Add BaseOffset as a final proxy type to pass instancechecks for offsets against `BaseOffset` ([#14678](https://github.com/rapidsai/cudf/pull/14678)) [@shwina](https://github.com/shwina)
- Add row conversion code from spark-rapids-jni ([#14664](https://github.com/rapidsai/cudf/pull/14664)) [@ttnghia](https://github.com/ttnghia)
- Unconditionally export the CCCL path ([#14656](https://github.com/rapidsai/cudf/pull/14656)) [@vyasr](https://github.com/vyasr)
- Ensure libcudf searches for our patched version of CCCL first ([#14655](https://github.com/rapidsai/cudf/pull/14655)) [@robertmaynard](https://github.com/robertmaynard)
- Constrain CUDA in notebook testing to prevent CUDA 12.1 usage until we have pynvjitlink ([#14648](https://github.com/rapidsai/cudf/pull/14648)) [@vyasr](https://github.com/vyasr)
- Fix invalid memory access in Parquet reader ([#14637](https://github.com/rapidsai/cudf/pull/14637)) [@etseidl](https://github.com/etseidl)
- Use column_empty over as_column([]) ([#14632](https://github.com/rapidsai/cudf/pull/14632)) [@mroeschke](https://github.com/mroeschke)
- Add (implicit) handling for torch tensors in is_scalar ([#14623](https://github.com/rapidsai/cudf/pull/14623)) [@wence-](https://github.com/wence-)
- Fix astype/fillna not maintaining column subclass and types ([#14615](https://github.com/rapidsai/cudf/pull/14615)) [@mroeschke](https://github.com/mroeschke)
- Remove non-empty nulls in cudf::get_json_object ([#14609](https://github.com/rapidsai/cudf/pull/14609)) [@davidwendt](https://github.com/davidwendt)
- Remove `cuda::proclaim_return_type` from nested lambda ([#14607](https://github.com/rapidsai/cudf/pull/14607)) [@ttnghia](https://github.com/ttnghia)
- Fix DataFrame.reindex when column reindexing to MultiIndex/RangeIndex ([#14605](https://github.com/rapidsai/cudf/pull/14605)) [@mroeschke](https://github.com/mroeschke)
- Address potential race conditions in Parquet reader ([#14602](https://github.com/rapidsai/cudf/pull/14602)) [@etseidl](https://github.com/etseidl)
- Fix DataFrame.reindex removing column name ([#14601](https://github.com/rapidsai/cudf/pull/14601)) [@mroeschke](https://github.com/mroeschke)
- Remove unsanitized input test data from copy gtests ([#14600](https://github.com/rapidsai/cudf/pull/14600)) [@davidwendt](https://github.com/davidwendt)
- Fix race detected in Parquet writer ([#14598](https://github.com/rapidsai/cudf/pull/14598)) [@etseidl](https://github.com/etseidl)
- Correct invalid or missing return types ([#14587](https://github.com/rapidsai/cudf/pull/14587)) [@robertmaynard](https://github.com/robertmaynard)
- Fix unsanitized nulls from strings segmented-reduce ([#14586](https://github.com/rapidsai/cudf/pull/14586)) [@davidwendt](https://github.com/davidwendt)
- Upgrade to nvCOMP 3.0.5 ([#14581](https://github.com/rapidsai/cudf/pull/14581)) [@davidwendt](https://github.com/davidwendt)
- Fix unsanitized nulls produced by `cudf::clamp` APIs ([#14580](https://github.com/rapidsai/cudf/pull/14580)) [@davidwendt](https://github.com/davidwendt)
- Fix unsanitized nulls produced by libcudf dictionary decode ([#14578](https://github.com/rapidsai/cudf/pull/14578)) [@davidwendt](https://github.com/davidwendt)
- Fixes a symbol group lookup table issue ([#14561](https://github.com/rapidsai/cudf/pull/14561)) [@elstehle](https://github.com/elstehle)
- Drop llvm16 from cuda118-conda devcontainer image ([#14526](https://github.com/rapidsai/cudf/pull/14526)) [@charlesbluca](https://github.com/charlesbluca)
- REF: Make DataFrame.from_pandas process by column ([#14483](https://github.com/rapidsai/cudf/pull/14483)) [@mroeschke](https://github.com/mroeschke)
- Improve memory footprint of isin by using contains ([#14478](https://github.com/rapidsai/cudf/pull/14478)) [@wence-](https://github.com/wence-)
- Move creation of env.yaml outside the current directory ([#14476](https://github.com/rapidsai/cudf/pull/14476)) [@davidwendt](https://github.com/davidwendt)
- Enable `pd.Timestamp` objects to be picklable when `cudf.pandas` is active ([#14474](https://github.com/rapidsai/cudf/pull/14474)) [@shwina](https://github.com/shwina)
- Correct dtype of count aggregations on empty dataframes ([#14473](https://github.com/rapidsai/cudf/pull/14473)) [@wence-](https://github.com/wence-)
- Avoid DataFrame conversion in `MultiIndex.from_pandas` ([#14470](https://github.com/rapidsai/cudf/pull/14470)) [@mroeschke](https://github.com/mroeschke)
- JSON writer: avoid default stream use in `string_scalar` constructors ([#14444](https://github.com/rapidsai/cudf/pull/14444)) [@vuule](https://github.com/vuule)
- Fix default stream use in the CSV reader ([#14443](https://github.com/rapidsai/cudf/pull/14443)) [@vuule](https://github.com/vuule)
- Preserve DataFrame(columns=).columns dtype during empty-like construction ([#14381](https://github.com/rapidsai/cudf/pull/14381)) [@mroeschke](https://github.com/mroeschke)
- Defer PTX file load to runtime ([#13690](https://github.com/rapidsai/cudf/pull/13690)) [@brandon-b-miller](https://github.com/brandon-b-miller)
## 📖 Documentation
- Disable parallel build ([#14796](https://github.com/rapidsai/cudf/pull/14796)) [@vyasr](https://github.com/vyasr)
- Add pylibcudf to the docs ([#14791](https://github.com/rapidsai/cudf/pull/14791)) [@vyasr](https://github.com/vyasr)
- Describe unpickling expectations when cudf.pandas is enabled ([#14693](https://github.com/rapidsai/cudf/pull/14693)) [@shwina](https://github.com/shwina)
- Update CONTRIBUTING for pyproject-only builds ([#14653](https://github.com/rapidsai/cudf/pull/14653)) [@vyasr](https://github.com/vyasr)
- More doxygen fixes ([#14639](https://github.com/rapidsai/cudf/pull/14639)) [@vyasr](https://github.com/vyasr)
- Enable doxygen XML generation and fix issues ([#14477](https://github.com/rapidsai/cudf/pull/14477)) [@vyasr](https://github.com/vyasr)
- Some doxygen improvements ([#14469](https://github.com/rapidsai/cudf/pull/14469)) [@vyasr](https://github.com/vyasr)
- Remove warning in dask-cudf docs ([#14454](https://github.com/rapidsai/cudf/pull/14454)) [@wence-](https://github.com/wence-)
- Update README links with redirects. ([#14378](https://github.com/rapidsai/cudf/pull/14378)) [@bdice](https://github.com/bdice)
- Add pip install instructions to README ([#13677](https://github.com/rapidsai/cudf/pull/13677)) [@shwina](https://github.com/shwina)
## 🚀 New Features
- Add ci check for external kernels ([#14768](https://github.com/rapidsai/cudf/pull/14768)) [@robertmaynard](https://github.com/robertmaynard)
- JSON single quote normalization API ([#14729](https://github.com/rapidsai/cudf/pull/14729)) [@shrshi](https://github.com/shrshi)
- Write cuDF version in Parquet "created_by" metadata field ([#14721](https://github.com/rapidsai/cudf/pull/14721)) [@etseidl](https://github.com/etseidl)
- Implement remaining copying APIs in pylibcudf along with required helper functions ([#14640](https://github.com/rapidsai/cudf/pull/14640)) [@vyasr](https://github.com/vyasr)
- Don't constrain `numba<0.58` ([#14616](https://github.com/rapidsai/cudf/pull/14616)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Add DELTA_LENGTH_BYTE_ARRAY encoder and decoder for Parquet ([#14590](https://github.com/rapidsai/cudf/pull/14590)) [@etseidl](https://github.com/etseidl)
- JSON - Parse mixed types as string in JSON reader ([#14572](https://github.com/rapidsai/cudf/pull/14572)) [@karthikeyann](https://github.com/karthikeyann)
- JSON quote normalization ([#14545](https://github.com/rapidsai/cudf/pull/14545)) [@shrshi](https://github.com/shrshi)
- Make DefaultHostMemoryAllocator settable ([#14523](https://github.com/rapidsai/cudf/pull/14523)) [@gerashegalov](https://github.com/gerashegalov)
- Implement more copying APIs in pylibcudf ([#14508](https://github.com/rapidsai/cudf/pull/14508)) [@vyasr](https://github.com/vyasr)
- Include writer code and writerVersion in ORC files ([#14458](https://github.com/rapidsai/cudf/pull/14458)) [@vuule](https://github.com/vuule)
- Parquet sub-rowgroup reading. ([#14360](https://github.com/rapidsai/cudf/pull/14360)) [@nvdbaranec](https://github.com/nvdbaranec)
- Move chars column to parent data buffer in strings column ([#14202](https://github.com/rapidsai/cudf/pull/14202)) [@karthikeyann](https://github.com/karthikeyann)
- PARQUET-2261 Size Statistics ([#14000](https://github.com/rapidsai/cudf/pull/14000)) [@etseidl](https://github.com/etseidl)
- Improve GroupBy JIT error handling ([#13854](https://github.com/rapidsai/cudf/pull/13854)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Generate unified Python/C++ docs ([#13846](https://github.com/rapidsai/cudf/pull/13846)) [@vyasr](https://github.com/vyasr)
- Expand JIT groupby test suite ([#13813](https://github.com/rapidsai/cudf/pull/13813)) [@brandon-b-miller](https://github.com/brandon-b-miller)
## 🛠️ Improvements
- Pin `pytest<8` ([#14920](https://github.com/rapidsai/cudf/pull/14920)) [@galipremsagar](https://github.com/galipremsagar)
- Move cudf::char_utf8 definition from detail to public header ([#14779](https://github.com/rapidsai/cudf/pull/14779)) [@davidwendt](https://github.com/davidwendt)
- Clean up `TimedeltaIndex.__init__` constructor ([#14775](https://github.com/rapidsai/cudf/pull/14775)) [@mroeschke](https://github.com/mroeschke)
- Clean up `DatetimeIndex.__init__` constructor ([#14774](https://github.com/rapidsai/cudf/pull/14774)) [@mroeschke](https://github.com/mroeschke)
- Some `frame.py` typing, move seldom used methods in `frame.py` ([#14766](https://github.com/rapidsai/cudf/pull/14766)) [@mroeschke](https://github.com/mroeschke)
- Remove **kwargs from astype ([#14765](https://github.com/rapidsai/cudf/pull/14765)) [@mroeschke](https://github.com/mroeschke)
- fix benchmarks compatibility with newer pytest-cases ([#14764](https://github.com/rapidsai/cudf/pull/14764)) [@jameslamb](https://github.com/jameslamb)
- Add `pynvjitlink` as a dependency ([#14763](https://github.com/rapidsai/cudf/pull/14763)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Resolve degenerate performance in `create_structs_data` ([#14761](https://github.com/rapidsai/cudf/pull/14761)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Simplify ColumnAccessor methods; avoid unnecessary validations ([#14758](https://github.com/rapidsai/cudf/pull/14758)) [@mroeschke](https://github.com/mroeschke)
- Pin pytest-cases<3.8.2 ([#14756](https://github.com/rapidsai/cudf/pull/14756)) [@mroeschke](https://github.com/mroeschke)
- Use _from_data instead of _from_columns for initialzing Frame ([#14755](https://github.com/rapidsai/cudf/pull/14755)) [@mroeschke](https://github.com/mroeschke)
- Consolidate cudf object handling in as_column ([#14754](https://github.com/rapidsai/cudf/pull/14754)) [@mroeschke](https://github.com/mroeschke)
- Reduce execution time of Parquet C++ tests ([#14750](https://github.com/rapidsai/cudf/pull/14750)) [@vuule](https://github.com/vuule)
- Implement to_datetime(..., utc=True) ([#14749](https://github.com/rapidsai/cudf/pull/14749)) [@mroeschke](https://github.com/mroeschke)
- Remove usages of rapids-env-update ([#14748](https://github.com/rapidsai/cudf/pull/14748)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)
- Provide explicit pool size and avoid RMM detail APIs ([#14741](https://github.com/rapidsai/cudf/pull/14741)) [@harrism](https://github.com/harrism)
- Implement `cudf.MultiIndex.from_arrays` ([#14740](https://github.com/rapidsai/cudf/pull/14740)) [@mroeschke](https://github.com/mroeschke)
- Remove unused/single use methods ([#14739](https://github.com/rapidsai/cudf/pull/14739)) [@mroeschke](https://github.com/mroeschke)
- refactor CUDA versions in dependencies.yaml ([#14733](https://github.com/rapidsai/cudf/pull/14733)) [@jameslamb](https://github.com/jameslamb)
- Remove unneeded methods in Column ([#14730](https://github.com/rapidsai/cudf/pull/14730)) [@mroeschke](https://github.com/mroeschke)
- Clean up base column methods ([#14725](https://github.com/rapidsai/cudf/pull/14725)) [@mroeschke](https://github.com/mroeschke)
- Ensure column.fillna signatures are consistent ([#14724](https://github.com/rapidsai/cudf/pull/14724)) [@mroeschke](https://github.com/mroeschke)
- Remove mimesis as a testing dependency ([#14723](https://github.com/rapidsai/cudf/pull/14723)) [@mroeschke](https://github.com/mroeschke)
- Replace as_numerical with as_numerical_column/codes ([#14719](https://github.com/rapidsai/cudf/pull/14719)) [@mroeschke](https://github.com/mroeschke)
- Use offsetalator in gather_chars ([#14700](https://github.com/rapidsai/cudf/pull/14700)) [@davidwendt](https://github.com/davidwendt)
- Use make_strings_children for fill() specialization logic ([#14697](https://github.com/rapidsai/cudf/pull/14697)) [@davidwendt](https://github.com/davidwendt)
- Change `io::detail::orc` namespace into `io::orc::detail` ([#14696](https://github.com/rapidsai/cudf/pull/14696)) [@ttnghia](https://github.com/ttnghia)
- Fix call to deprecated factory function ([#14695](https://github.com/rapidsai/cudf/pull/14695)) [@davidwendt](https://github.com/davidwendt)
- Use as_column instead of arange for range like inputs ([#14689](https://github.com/rapidsai/cudf/pull/14689)) [@mroeschke](https://github.com/mroeschke)
- Reorganize ORC reader into multiple files and perform some small fixes to cuIO code ([#14665](https://github.com/rapidsai/cudf/pull/14665)) [@ttnghia](https://github.com/ttnghia)
- Split parquet test into multiple files ([#14663](https://github.com/rapidsai/cudf/pull/14663)) [@etseidl](https://github.com/etseidl)
- Custom error messages for IO with nonexistent files ([#14662](https://github.com/rapidsai/cudf/pull/14662)) [@vuule](https://github.com/vuule)
- Explicitly pass .dtype into is_foo_dtype functions ([#14657](https://github.com/rapidsai/cudf/pull/14657)) [@mroeschke](https://github.com/mroeschke)
- Basic validation in reader benchmarks ([#14647](https://github.com/rapidsai/cudf/pull/14647)) [@vuule](https://github.com/vuule)
- Update dependencies.yaml to support CUDA 12.*. ([#14644](https://github.com/rapidsai/cudf/pull/14644)) [@bdice](https://github.com/bdice)
- Consolidate memoryview handling in as_column ([#14643](https://github.com/rapidsai/cudf/pull/14643)) [@mroeschke](https://github.com/mroeschke)
- Convert `FieldType` to scoped enum ([#14642](https://github.com/rapidsai/cudf/pull/14642)) [@vuule](https://github.com/vuule)
- Use instance over is_foo_dtype ([#14641](https://github.com/rapidsai/cudf/pull/14641)) [@mroeschke](https://github.com/mroeschke)
- Use isinstance over is_foo_dtype internally ([#14638](https://github.com/rapidsai/cudf/pull/14638)) [@mroeschke](https://github.com/mroeschke)
- Remove unnecessary **kwargs in function signatures ([#14635](https://github.com/rapidsai/cudf/pull/14635)) [@mroeschke](https://github.com/mroeschke)
- Drop nvbench patch for nvml. ([#14631](https://github.com/rapidsai/cudf/pull/14631)) [@bdice](https://github.com/bdice)
- Drop Pascal GPU support. ([#14630](https://github.com/rapidsai/cudf/pull/14630)) [@bdice](https://github.com/bdice)
- Add cpp/doxygen/xml to .gitignore ([#14613](https://github.com/rapidsai/cudf/pull/14613)) [@davidwendt](https://github.com/davidwendt)
- Create strings-specific make_offsets_child_column for multiple offset types ([#14612](https://github.com/rapidsai/cudf/pull/14612)) [@davidwendt](https://github.com/davidwendt)
- Use the offsetalator in cudf::concatenate for strings ([#14611](https://github.com/rapidsai/cudf/pull/14611)) [@davidwendt](https://github.com/davidwendt)
- Make Parquet ColumnIndex null_counts optional ([#14596](https://github.com/rapidsai/cudf/pull/14596)) [@etseidl](https://github.com/etseidl)
- Support `freq` in DatetimeIndex ([#14593](https://github.com/rapidsai/cudf/pull/14593)) [@shwina](https://github.com/shwina)
- Remove legacy benchmarks for cuDF-python ([#14591](https://github.com/rapidsai/cudf/pull/14591)) [@osidekyle](https://github.com/osidekyle)
- Remove WORKSPACE env var from cudf_test temp_directory class ([#14588](https://github.com/rapidsai/cudf/pull/14588)) [@davidwendt](https://github.com/davidwendt)
- Use exceptions instead of return values to handle errors in `CompactProtocolReader` ([#14582](https://github.com/rapidsai/cudf/pull/14582)) [@vuule](https://github.com/vuule)
- Use cuda::proclaim_return_type on device lambdas. ([#14577](https://github.com/rapidsai/cudf/pull/14577)) [@bdice](https://github.com/bdice)
- Update to CCCL 2.2.0. ([#14576](https://github.com/rapidsai/cudf/pull/14576)) [@bdice](https://github.com/bdice)
- Update dependencies.yaml to new pip index ([#14575](https://github.com/rapidsai/cudf/pull/14575)) [@vyasr](https://github.com/vyasr)
- Simplify Python CMake ([#14565](https://github.com/rapidsai/cudf/pull/14565)) [@vyasr](https://github.com/vyasr)
- Java expose parquet pass_read_limit ([#14564](https://github.com/rapidsai/cudf/pull/14564)) [@revans2](https://github.com/revans2)
- Add column sanitization checks in `CUDF_TEST_EXPECT_COLUMN_*` macros ([#14559](https://github.com/rapidsai/cudf/pull/14559)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Use cudf_test temp_directory class for nvtext::subword_tokenize gbenchmark ([#14558](https://github.com/rapidsai/cudf/pull/14558)) [@davidwendt](https://github.com/davidwendt)
- Fix return type of prefix increment overloads ([#14544](https://github.com/rapidsai/cudf/pull/14544)) [@vuule](https://github.com/vuule)
- Make bpe_merge_pairs_impl member private ([#14543](https://github.com/rapidsai/cudf/pull/14543)) [@davidwendt](https://github.com/davidwendt)
- Small clean up in `io::statistics` ([#14542](https://github.com/rapidsai/cudf/pull/14542)) [@vuule](https://github.com/vuule)
- Change json gtest environment variable to compile-time definition ([#14541](https://github.com/rapidsai/cudf/pull/14541)) [@davidwendt](https://github.com/davidwendt)
- Remove extra total chars size calculation from cudf::concatenate ([#14540](https://github.com/rapidsai/cudf/pull/14540)) [@davidwendt](https://github.com/davidwendt)
- Refactor IndexedFrame.hash_values to use cudf::hashing functions, add xxhash64 to cudf Python. ([#14538](https://github.com/rapidsai/cudf/pull/14538)) [@bdice](https://github.com/bdice)
- Move non-templated inline function definitions from table_view.hpp to table_view.cpp ([#14535](https://github.com/rapidsai/cudf/pull/14535)) [@davidwendt](https://github.com/davidwendt)
- Add JNI for strings::code_points ([#14533](https://github.com/rapidsai/cudf/pull/14533)) [@thirtiseven](https://github.com/thirtiseven)
- Add a test for issue 12773 ([#14529](https://github.com/rapidsai/cudf/pull/14529)) [@vyasr](https://github.com/vyasr)
- Split libarrow build dependencies. ([#14506](https://github.com/rapidsai/cudf/pull/14506)) [@bdice](https://github.com/bdice)
- Implement `IndexedFrame.duplicated` with `distinct_indices` + `scatter` ([#14493](https://github.com/rapidsai/cudf/pull/14493)) [@wence-](https://github.com/wence-)
- Expunge as_frame conversions in Column algorithms ([#14491](https://github.com/rapidsai/cudf/pull/14491)) [@wence-](https://github.com/wence-)
- Remove unsanitized null from input strings column in rank_tests.cpp ([#14475](https://github.com/rapidsai/cudf/pull/14475)) [@davidwendt](https://github.com/davidwendt)
- Refactor Parquet kernel_error ([#14464](https://github.com/rapidsai/cudf/pull/14464)) [@etseidl](https://github.com/etseidl)
- Deprecate cudf::make_strings_column accepting typed offsets ([#14461](https://github.com/rapidsai/cudf/pull/14461)) [@davidwendt](https://github.com/davidwendt)
- Remove deprecated nvtext::load_merge_pairs_file ([#14460](https://github.com/rapidsai/cudf/pull/14460)) [@davidwendt](https://github.com/davidwendt)
- Introduce Comprehensive Pathological Unit Tests for Issue #14409 ([#14459](https://github.com/rapidsai/cudf/pull/14459)) [@aocsa](https://github.com/aocsa)
- Expose stream parameter in public nvtext APIs ([#14456](https://github.com/rapidsai/cudf/pull/14456)) [@davidwendt](https://github.com/davidwendt)
- Include encode type in the error message when unsupported Parquet encoding is detected ([#14453](https://github.com/rapidsai/cudf/pull/14453)) [@ZelboK](https://github.com/ZelboK)
- Remove null mask for zero nulls in json readers ([#14451](https://github.com/rapidsai/cudf/pull/14451)) [@karthikeyann](https://github.com/karthikeyann)
- Refactor cudf.Series.__init__ ([#14450](https://github.com/rapidsai/cudf/pull/14450)) [@mroeschke](https://github.com/mroeschke)
- Remove the use of `volatile` in Parquet ([#14448](https://github.com/rapidsai/cudf/pull/14448)) [@vuule](https://github.com/vuule)
- REF: Remove **kwargs from to_pandas, raise if nullable is not implemented ([#14438](https://github.com/rapidsai/cudf/pull/14438)) [@mroeschke](https://github.com/mroeschke)
- Testing stream pool implementation ([#14437](https://github.com/rapidsai/cudf/pull/14437)) [@shrshi](https://github.com/shrshi)
- Match pandas join ordering obligations in pandas-compatible mode ([#14428](https://github.com/rapidsai/cudf/pull/14428)) [@wence-](https://github.com/wence-)
- Forward-merge branch-23.12 to branch-24.02 ([#14426](https://github.com/rapidsai/cudf/pull/14426)) [@bdice](https://github.com/bdice)
- Use isinstance(..., cudf.IntervalDtype) instead of is_interval_dtype ([#14424](https://github.com/rapidsai/cudf/pull/14424)) [@mroeschke](https://github.com/mroeschke)
- Use isinstance(..., cudf.CategoricalDtype) instead of is_categorical_dtype ([#14423](https://github.com/rapidsai/cudf/pull/14423)) [@mroeschke](https://github.com/mroeschke)
- Forward-merge branch-23.12 to branch-24.02 ([#14422](https://github.com/rapidsai/cudf/pull/14422)) [@bdice](https://github.com/bdice)
- REF: Remove instances of pd.core ([#14421](https://github.com/rapidsai/cudf/pull/14421)) [@mroeschke](https://github.com/mroeschke)
- Expose streams in public filling APIs for label_bins ([#14401](https://github.com/rapidsai/cudf/pull/14401)) [@ZelboK](https://github.com/ZelboK)
- Consolidate 1D pandas object handling in as_column ([#14394](https://github.com/rapidsai/cudf/pull/14394)) [@mroeschke](https://github.com/mroeschke)
- Limit DELTA_BINARY_PACKED encoder to the same number of bits as the physical type being encoded ([#14392](https://github.com/rapidsai/cudf/pull/14392)) [@etseidl](https://github.com/etseidl)
- Add SHA-1 and SHA-2 hash functions. ([#14391](https://github.com/rapidsai/cudf/pull/14391)) [@bdice](https://github.com/bdice)
- Expose streams in Parquet reader and writer APIs ([#14359](https://github.com/rapidsai/cudf/pull/14359)) [@shrshi](https://github.com/shrshi)
- Update to fmt 10.1.1 and spdlog 1.12.0. ([#14355](https://github.com/rapidsai/cudf/pull/14355)) [@bdice](https://github.com/bdice)
- Replace default stream for scalars and column factories usages (because of defaulted arguments) ([#14354](https://github.com/rapidsai/cudf/pull/14354)) [@karthikeyann](https://github.com/karthikeyann)
- Expose streams in ORC reader and writer APIs ([#14350](https://github.com/rapidsai/cudf/pull/14350)) [@shrshi](https://github.com/shrshi)
- Convert compression and io to string axis type in IO benchmarks ([#14347](https://github.com/rapidsai/cudf/pull/14347)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Add cuDF devcontainers ([#14015](https://github.com/rapidsai/cudf/pull/14015)) [@trxcllnt](https://github.com/trxcllnt)
- Refactoring of Buffers (last step towards unifying COW and Spilling) ([#13801](https://github.com/rapidsai/cudf/pull/13801)) [@madsbk](https://github.com/madsbk)
- Switch to scikit-build-core ([#13531](https://github.com/rapidsai/cudf/pull/13531)) [@vyasr](https://github.com/vyasr)
- Simplify null count checking in column equality comparator ([#13312](https://github.com/rapidsai/cudf/pull/13312)) [@vyasr](https://github.com/vyasr)
# cuDF 23.12.00 (6 Dec 2023)
## 🚨 Breaking Changes
- Raise error in `reindex` when `index` is not unique ([#14400](https://github.com/rapidsai/cudf/pull/14400)) [@galipremsagar](https://github.com/galipremsagar)
- Expose stream parameter to get_json_object API ([#14297](https://github.com/rapidsai/cudf/pull/14297)) [@davidwendt](https://github.com/davidwendt)
- Refactor cudf_kafka to use skbuild ([#14292](https://github.com/rapidsai/cudf/pull/14292)) [@jdye64](https://github.com/jdye64)
- Expose stream parameter in public strings convert APIs ([#14255](https://github.com/rapidsai/cudf/pull/14255)) [@davidwendt](https://github.com/davidwendt)
- Upgrade to nvCOMP 3.0.4 ([#13815](https://github.com/rapidsai/cudf/pull/13815)) [@vuule](https://github.com/vuule)
## 🐛 Bug Fixes
- Update actions/labeler to v4 ([#14562](https://github.com/rapidsai/cudf/pull/14562)) [@raydouglass](https://github.com/raydouglass)
- Fix data corruption when skipping rows ([#14557](https://github.com/rapidsai/cudf/pull/14557)) [@etseidl](https://github.com/etseidl)
- Fix function name typo in `cudf.pandas` profiler ([#14514](https://github.com/rapidsai/cudf/pull/14514)) [@galipremsagar](https://github.com/galipremsagar)
- Fix intermediate type checking in expression parsing ([#14445](https://github.com/rapidsai/cudf/pull/14445)) [@vyasr](https://github.com/vyasr)
- Forward merge `branch-23.10` into `branch-23.12` ([#14435](https://github.com/rapidsai/cudf/pull/14435)) [@raydouglass](https://github.com/raydouglass)
- Remove needs: wheel-build-cudf. ([#14427](https://github.com/rapidsai/cudf/pull/14427)) [@bdice](https://github.com/bdice)
- Fix dask dependency in custreamz ([#14420](https://github.com/rapidsai/cudf/pull/14420)) [@vyasr](https://github.com/vyasr)
- Ensure nvbench initializes nvml context when built statically ([#14411](https://github.com/rapidsai/cudf/pull/14411)) [@robertmaynard](https://github.com/robertmaynard)
- Support java AST String literal with desired encoding ([#14402](https://github.com/rapidsai/cudf/pull/14402)) [@winningsix](https://github.com/winningsix)
- Raise error in `reindex` when `index` is not unique ([#14400](https://github.com/rapidsai/cudf/pull/14400)) [@galipremsagar](https://github.com/galipremsagar)
- Always build nvbench statically so we don't need to package it ([#14399](https://github.com/rapidsai/cudf/pull/14399)) [@robertmaynard](https://github.com/robertmaynard)
- Fix token-count logic in nvtext::tokenize_with_vocabulary ([#14393](https://github.com/rapidsai/cudf/pull/14393)) [@davidwendt](https://github.com/davidwendt)
- Fix as_column(pd.Timestamp/Timedelta, length=) not respecting length ([#14390](https://github.com/rapidsai/cudf/pull/14390)) [@mroeschke](https://github.com/mroeschke)
- cudf.pandas: cuDF subpath checking in module `__getattr__` ([#14388](https://github.com/rapidsai/cudf/pull/14388)) [@shwina](https://github.com/shwina)
- Fix and disable encoding for nanosecond statistics in ORC writer ([#14367](https://github.com/rapidsai/cudf/pull/14367)) [@vuule](https://github.com/vuule)
- Add the new manylinux builds to the build job ([#14351](https://github.com/rapidsai/cudf/pull/14351)) [@vyasr](https://github.com/vyasr)
- cudf jit parser now supports .pragma instructions with quotes ([#14348](https://github.com/rapidsai/cudf/pull/14348)) [@robertmaynard](https://github.com/robertmaynard)
- Fix overflow check in `cudf::merge` ([#14345](https://github.com/rapidsai/cudf/pull/14345)) [@divyegala](https://github.com/divyegala)
- Add cramjam ([#14344](https://github.com/rapidsai/cudf/pull/14344)) [@vyasr](https://github.com/vyasr)
- Enable `dask_cudf/io` pytests in CI ([#14338](https://github.com/rapidsai/cudf/pull/14338)) [@galipremsagar](https://github.com/galipremsagar)
- Temporarily avoid the current build of pydata-sphinx-theme ([#14332](https://github.com/rapidsai/cudf/pull/14332)) [@vyasr](https://github.com/vyasr)
- Fix host buffer access from device function in the Parquet reader ([#14328](https://github.com/rapidsai/cudf/pull/14328)) [@vuule](https://github.com/vuule)
- Run IO tests for Dask-cuDF ([#14327](https://github.com/rapidsai/cudf/pull/14327)) [@rjzamora](https://github.com/rjzamora)
- Fix logical type issues in the Parquet writer ([#14322](https://github.com/rapidsai/cudf/pull/14322)) [@vuule](https://github.com/vuule)
- Remove aws-sdk-pinning and revert to arrow 12.0.1 ([#14319](https://github.com/rapidsai/cudf/pull/14319)) [@vyasr](https://github.com/vyasr)
- test is_valid before reading column data ([#14318](https://github.com/rapidsai/cudf/pull/14318)) [@etseidl](https://github.com/etseidl)
- Fix gtest validity setting for TextTokenizeTest.Vocabulary ([#14312](https://github.com/rapidsai/cudf/pull/14312)) [@davidwendt](https://github.com/davidwendt)
- Fixes stack context for json lines format that recovers from invalid JSON lines ([#14309](https://github.com/rapidsai/cudf/pull/14309)) [@elstehle](https://github.com/elstehle)
- Downgrade to Arrow 12.0.0 for aws-sdk-cpp and fix cudf_kafka builds for new CI containers ([#14296](https://github.com/rapidsai/cudf/pull/14296)) [@vyasr](https://github.com/vyasr)
- fixing thread index overflow issue ([#14290](https://github.com/rapidsai/cudf/pull/14290)) [@hyperbolic2346](https://github.com/hyperbolic2346)
- Fix memset error in nvtext::edit_distance_matrix ([#14283](https://github.com/rapidsai/cudf/pull/14283)) [@davidwendt](https://github.com/davidwendt)
- Changes JSON reader's recovery option's behaviour to ignore all characters after a valid JSON record ([#14279](https://github.com/rapidsai/cudf/pull/14279)) [@elstehle](https://github.com/elstehle)
- Handle empty string correctly in Parquet statistics ([#14257](https://github.com/rapidsai/cudf/pull/14257)) [@etseidl](https://github.com/etseidl)
- Fixes behaviour for incomplete lines when `recover_with_nulls` is enabled ([#14252](https://github.com/rapidsai/cudf/pull/14252)) [@elstehle](https://github.com/elstehle)
- cudf::detail::pinned_allocator doesn't throw from `deallocate` ([#14251](https://github.com/rapidsai/cudf/pull/14251)) [@robertmaynard](https://github.com/robertmaynard)
- Fix strings replace for adjacent, identical multi-byte UTF-8 character targets ([#14235](https://github.com/rapidsai/cudf/pull/14235)) [@davidwendt](https://github.com/davidwendt)
- Fix the precision when converting a decimal128 column to an arrow array ([#14230](https://github.com/rapidsai/cudf/pull/14230)) [@jihoonson](https://github.com/jihoonson)
- Fixing parquet list of struct interpretation ([#13715](https://github.com/rapidsai/cudf/pull/13715)) [@hyperbolic2346](https://github.com/hyperbolic2346)
## 📖 Documentation
- Fix io reference in docs. ([#14452](https://github.com/rapidsai/cudf/pull/14452)) [@bdice](https://github.com/bdice)
- Update README ([#14374](https://github.com/rapidsai/cudf/pull/14374)) [@shwina](https://github.com/shwina)
- Example code for blog on new row comparators ([#13795](https://github.com/rapidsai/cudf/pull/13795)) [@divyegala](https://github.com/divyegala)
## 🚀 New Features
- Expose streams in public unary APIs ([#14342](https://github.com/rapidsai/cudf/pull/14342)) [@vyasr](https://github.com/vyasr)
- Add python tests for Parquet DELTA_BINARY_PACKED encoder ([#14316](https://github.com/rapidsai/cudf/pull/14316)) [@etseidl](https://github.com/etseidl)
- Update rapids-cmake functions to non-deprecated signatures ([#14265](https://github.com/rapidsai/cudf/pull/14265)) [@robertmaynard](https://github.com/robertmaynard)
- Expose streams in public null mask APIs ([#14263](https://github.com/rapidsai/cudf/pull/14263)) [@vyasr](https://github.com/vyasr)
- Expose streams in binaryop APIs ([#14187](https://github.com/rapidsai/cudf/pull/14187)) [@vyasr](https://github.com/vyasr)
- Add pylibcudf.Scalar that interoperates with Arrow scalars ([#14133](https://github.com/rapidsai/cudf/pull/14133)) [@vyasr](https://github.com/vyasr)
- Add decoder for DELTA_BYTE_ARRAY to Parquet reader ([#14101](https://github.com/rapidsai/cudf/pull/14101)) [@etseidl](https://github.com/etseidl)
- Add DELTA_BINARY_PACKED encoder for Parquet writer ([#14100](https://github.com/rapidsai/cudf/pull/14100)) [@etseidl](https://github.com/etseidl)
- Add BytePairEncoder class to cuDF ([#13891](https://github.com/rapidsai/cudf/pull/13891)) [@davidwendt](https://github.com/davidwendt)
- Upgrade to nvCOMP 3.0.4 ([#13815](https://github.com/rapidsai/cudf/pull/13815)) [@vuule](https://github.com/vuule)
- Use `pynvjitlink` for CUDA 12+ MVC ([#13650](https://github.com/rapidsai/cudf/pull/13650)) [@brandon-b-miller](https://github.com/brandon-b-miller)
## 🛠️ Improvements
- Build concurrency for nightly and merge triggers ([#14441](https://github.com/rapidsai/cudf/pull/14441)) [@bdice](https://github.com/bdice)
- Cleanup remaining usages of dask dependencies ([#14407](https://github.com/rapidsai/cudf/pull/14407)) [@galipremsagar](https://github.com/galipremsagar)
- Update to Arrow 14.0.1. ([#14387](https://github.com/rapidsai/cudf/pull/14387)) [@bdice](https://github.com/bdice)
- Remove Cython libcpp wrappers ([#14382](https://github.com/rapidsai/cudf/pull/14382)) [@vyasr](https://github.com/vyasr)
- Forward-merge branch-23.10 to branch-23.12 ([#14372](https://github.com/rapidsai/cudf/pull/14372)) [@bdice](https://github.com/bdice)
- Upgrade to arrow 14 ([#14371](https://github.com/rapidsai/cudf/pull/14371)) [@galipremsagar](https://github.com/galipremsagar)
- Fix a pytest typo in `test_kurt_skew_error` ([#14368](https://github.com/rapidsai/cudf/pull/14368)) [@galipremsagar](https://github.com/galipremsagar)
- Use new rapids-dask-dependency metapackage for managing dask versions ([#14364](https://github.com/rapidsai/cudf/pull/14364)) [@vyasr](https://github.com/vyasr)
- Change `nullable()` to `has_nulls()` in `cudf::detail::gather` ([#14363](https://github.com/rapidsai/cudf/pull/14363)) [@divyegala](https://github.com/divyegala)
- Split up scan_inclusive.cu to improve its compile time ([#14358](https://github.com/rapidsai/cudf/pull/14358)) [@davidwendt](https://github.com/davidwendt)
- Implement user_datasource_wrapper is_empty() and is_device_read_preferred(). ([#14357](https://github.com/rapidsai/cudf/pull/14357)) [@tpn](https://github.com/tpn)
- Added streams to CSV reader and writer api ([#14340](https://github.com/rapidsai/cudf/pull/14340)) [@shrshi](https://github.com/shrshi)
- Upgrade wheels to use arrow 13 ([#14339](https://github.com/rapidsai/cudf/pull/14339)) [@vyasr](https://github.com/vyasr)
- Rework nvtext::byte_pair_encoding API ([#14337](https://github.com/rapidsai/cudf/pull/14337)) [@davidwendt](https://github.com/davidwendt)
- Improve performance of nvtext::tokenize_with_vocabulary for long strings ([#14336](https://github.com/rapidsai/cudf/pull/14336)) [@davidwendt](https://github.com/davidwendt)
- Upgrade `arrow` to `13` ([#14330](https://github.com/rapidsai/cudf/pull/14330)) [@galipremsagar](https://github.com/galipremsagar)
- Expose stream parameter in public nvtext replace APIs ([#14329](https://github.com/rapidsai/cudf/pull/14329)) [@davidwendt](https://github.com/davidwendt)
- Drop `pyorc` dependency and use `pandas`/`pyarrow` instead ([#14323](https://github.com/rapidsai/cudf/pull/14323)) [@galipremsagar](https://github.com/galipremsagar)
- Avoid `pyarrow.fs` import for local storage ([#14321](https://github.com/rapidsai/cudf/pull/14321)) [@rjzamora](https://github.com/rjzamora)
- Unpin `dask` and `distributed` for `23.12` development ([#14320](https://github.com/rapidsai/cudf/pull/14320)) [@galipremsagar](https://github.com/galipremsagar)
- Expose stream parameter in public nvtext tokenize APIs ([#14317](https://github.com/rapidsai/cudf/pull/14317)) [@davidwendt](https://github.com/davidwendt)
- Added streams to JSON reader and writer api ([#14313](https://github.com/rapidsai/cudf/pull/14313)) [@shrshi](https://github.com/shrshi)
- Minor improvements in `source_info` ([#14308](https://github.com/rapidsai/cudf/pull/14308)) [@vuule](https://github.com/vuule)
- Forward-merge branch-23.10 to branch-23.12 ([#14307](https://github.com/rapidsai/cudf/pull/14307)) [@bdice](https://github.com/bdice)
- Add stream parameter to Set Operations (Public List APIs) ([#14305](https://github.com/rapidsai/cudf/pull/14305)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Expose stream parameter to get_json_object API ([#14297](https://github.com/rapidsai/cudf/pull/14297)) [@davidwendt](https://github.com/davidwendt)
- Sort dictionary data alphabetically in the ORC writer ([#14295](https://github.com/rapidsai/cudf/pull/14295)) [@vuule](https://github.com/vuule)
- Expose stream parameter in public strings filter APIs ([#14293](https://github.com/rapidsai/cudf/pull/14293)) [@davidwendt](https://github.com/davidwendt)
- Refactor cudf_kafka to use skbuild ([#14292](https://github.com/rapidsai/cudf/pull/14292)) [@jdye64](https://github.com/jdye64)
- Update `shared-action-workflows` references ([#14289](https://github.com/rapidsai/cudf/pull/14289)) [@AyodeAwe](https://github.com/AyodeAwe)
- Register ``partd`` encode dispatch in ``dask_cudf`` ([#14287](https://github.com/rapidsai/cudf/pull/14287)) [@rjzamora](https://github.com/rjzamora)
- Update versioning strategy ([#14285](https://github.com/rapidsai/cudf/pull/14285)) [@vyasr](https://github.com/vyasr)
- Move and rename byte-pair-encoding source files ([#14284](https://github.com/rapidsai/cudf/pull/14284)) [@davidwendt](https://github.com/davidwendt)
- Expose stream parameter in public strings combine APIs ([#14281](https://github.com/rapidsai/cudf/pull/14281)) [@davidwendt](https://github.com/davidwendt)
- Expose stream parameter in public strings contains APIs ([#14280](https://github.com/rapidsai/cudf/pull/14280)) [@davidwendt](https://github.com/davidwendt)
- Add stream parameter to List Sort and Filter APIs ([#14272](https://github.com/rapidsai/cudf/pull/14272)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Use branch-23.12 workflows. ([#14271](https://github.com/rapidsai/cudf/pull/14271)) [@bdice](https://github.com/bdice)
- Refactor LogicalType for Parquet ([#14264](https://github.com/rapidsai/cudf/pull/14264)) [@etseidl](https://github.com/etseidl)
- Centralize chunked reading code in the parquet reader to reader_impl_chunking.cu ([#14262](https://github.com/rapidsai/cudf/pull/14262)) [@nvdbaranec](https://github.com/nvdbaranec)
- Expose stream parameter in public strings replace APIs ([#14261](https://github.com/rapidsai/cudf/pull/14261)) [@davidwendt](https://github.com/davidwendt)
- Expose stream parameter in public strings APIs ([#14260](https://github.com/rapidsai/cudf/pull/14260)) [@davidwendt](https://github.com/davidwendt)
- Cleanup of namespaces in parquet code. ([#14259](https://github.com/rapidsai/cudf/pull/14259)) [@nvdbaranec](https://github.com/nvdbaranec)
- Make parquet schema index type consistent ([#14256](https://github.com/rapidsai/cudf/pull/14256)) [@hyperbolic2346](https://github.com/hyperbolic2346)
- Expose stream parameter in public strings convert APIs ([#14255](https://github.com/rapidsai/cudf/pull/14255)) [@davidwendt](https://github.com/davidwendt)
- Add in java bindings for DataSource ([#14254](https://github.com/rapidsai/cudf/pull/14254)) [@revans2](https://github.com/revans2)
- Reimplement `cudf::merge` for nested types without using comparators ([#14250](https://github.com/rapidsai/cudf/pull/14250)) [@divyegala](https://github.com/divyegala)
- Add stream parameter to List Manipulation and Operations APIs ([#14248](https://github.com/rapidsai/cudf/pull/14248)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Expose stream parameter in public strings split/partition APIs ([#14247](https://github.com/rapidsai/cudf/pull/14247)) [@davidwendt](https://github.com/davidwendt)
- Improve `contains_column` by invoking `contains_table` ([#14238](https://github.com/rapidsai/cudf/pull/14238)) [@PointKernel](https://github.com/PointKernel)
- Detect and report errors in Parquet header parsing ([#14237](https://github.com/rapidsai/cudf/pull/14237)) [@etseidl](https://github.com/etseidl)
- Normalizing offsets iterator ([#14234](https://github.com/rapidsai/cudf/pull/14234)) [@davidwendt](https://github.com/davidwendt)
- Forward merge `23.10` into `23.12` ([#14231](https://github.com/rapidsai/cudf/pull/14231)) [@galipremsagar](https://github.com/galipremsagar)
- Return error if BOOL8 column-type is used with integers-to-hex ([#14208](https://github.com/rapidsai/cudf/pull/14208)) [@davidwendt](https://github.com/davidwendt)
- Enable indexalator for device code ([#14206](https://github.com/rapidsai/cudf/pull/14206)) [@davidwendt](https://github.com/davidwendt)
- Marginally reduce memory footprint of joins ([#14197](https://github.com/rapidsai/cudf/pull/14197)) [@wence-](https://github.com/wence-)
- Add nvtx annotations to spilling-based data movement ([#14196](https://github.com/rapidsai/cudf/pull/14196)) [@wence-](https://github.com/wence-)
- Optimize ORC writer for decimal columns ([#14190](https://github.com/rapidsai/cudf/pull/14190)) [@vuule](https://github.com/vuule)
- Remove the use of volatile in ORC ([#14175](https://github.com/rapidsai/cudf/pull/14175)) [@vuule](https://github.com/vuule)
- Add `bytes_per_second` to distinct_count of stream_compaction nvbench. ([#14172](https://github.com/rapidsai/cudf/pull/14172)) [@Blonck](https://github.com/Blonck)
- Add `bytes_per_second` to transpose benchmark ([#14170](https://github.com/rapidsai/cudf/pull/14170)) [@Blonck](https://github.com/Blonck)
- cuDF: Build CUDA 12.0 ARM conda packages. ([#14112](https://github.com/rapidsai/cudf/pull/14112)) [@bdice](https://github.com/bdice)
- Add `bytes_per_second` to shift benchmark ([#13950](https://github.com/rapidsai/cudf/pull/13950)) [@Blonck](https://github.com/Blonck)
- Extract `debug_utilities.hpp/cu` from `column_utilities.hpp/cu` ([#13720](https://github.com/rapidsai/cudf/pull/13720)) [@ttnghia](https://github.com/ttnghia)
# cuDF 23.10.00 (11 Oct 2023)
## 🚨 Breaking Changes
- Expose stream parameter in public nvtext ngram APIs ([#14061](https://github.com/rapidsai/cudf/pull/14061)) [@davidwendt](https://github.com/davidwendt)
- Raise `MixedTypeError` when a column of mixed-dtype is being constructed ([#14050](https://github.com/rapidsai/cudf/pull/14050)) [@galipremsagar](https://github.com/galipremsagar)
- Raise `NotImplementedError` for `MultiIndex.to_series` ([#14049](https://github.com/rapidsai/cudf/pull/14049)) [@galipremsagar](https://github.com/galipremsagar)
- Create table_input_metadata from a table_metadata ([#13920](https://github.com/rapidsai/cudf/pull/13920)) [@etseidl](https://github.com/etseidl)
- Enable RLE boolean encoding for v2 Parquet files ([#13886](https://github.com/rapidsai/cudf/pull/13886)) [@etseidl](https://github.com/etseidl)
- Change `NA` to `NaT` for `datetime` and `timedelta` types ([#13868](https://github.com/rapidsai/cudf/pull/13868)) [@galipremsagar](https://github.com/galipremsagar)
- Fix `any`, `all` reduction behavior for `axis=None` and warn for other reductions ([#13831](https://github.com/rapidsai/cudf/pull/13831)) [@galipremsagar](https://github.com/galipremsagar)
- Add minhash support for MurmurHash3_x64_128 ([#13796](https://github.com/rapidsai/cudf/pull/13796)) [@davidwendt](https://github.com/davidwendt)
- Remove the libcudf cudf::offset_type type ([#13788](https://github.com/rapidsai/cudf/pull/13788)) [@davidwendt](https://github.com/davidwendt)
- Raise error when trying to join `datetime` and `timedelta` types with other types ([#13786](https://github.com/rapidsai/cudf/pull/13786)) [@galipremsagar](https://github.com/galipremsagar)
- Update to Cython 3.0.0 ([#13777](https://github.com/rapidsai/cudf/pull/13777)) [@vyasr](https://github.com/vyasr)
- Raise error on constructing an array from mixed type inputs ([#13768](https://github.com/rapidsai/cudf/pull/13768)) [@galipremsagar](https://github.com/galipremsagar)
- Enforce deprecations in `23.10` ([#13732](https://github.com/rapidsai/cudf/pull/13732)) [@galipremsagar](https://github.com/galipremsagar)
- Upgrade to arrow 12 ([#13728](https://github.com/rapidsai/cudf/pull/13728)) [@galipremsagar](https://github.com/galipremsagar)
- Remove Arrow dependency from the `datasource.hpp` public header ([#13698](https://github.com/rapidsai/cudf/pull/13698)) [@vuule](https://github.com/vuule)
## 🐛 Bug Fixes
- Fix inaccurate ceil/floor and inaccurate rescaling casts of fixed-point values. ([#14242](https://github.com/rapidsai/cudf/pull/14242)) [@bdice](https://github.com/bdice)
- Fix inaccuracy in decimal128 rounding. ([#14233](https://github.com/rapidsai/cudf/pull/14233)) [@bdice](https://github.com/bdice)
- Workaround for illegal instruction error in sm90 for warp instrinsics with mask ([#14201](https://github.com/rapidsai/cudf/pull/14201)) [@karthikeyann](https://github.com/karthikeyann)
- Fix pytorch related pytest ([#14198](https://github.com/rapidsai/cudf/pull/14198)) [@galipremsagar](https://github.com/galipremsagar)
- Pin to `aws-sdk-cpp<1.11` ([#14173](https://github.com/rapidsai/cudf/pull/14173)) [@pentschev](https://github.com/pentschev)
- Fix assert failure for range window functions ([#14168](https://github.com/rapidsai/cudf/pull/14168)) [@mythrocks](https://github.com/mythrocks)
- Fix Memcheck error found in JSON_TEST JsonReaderTest.ErrorStrings ([#14164](https://github.com/rapidsai/cudf/pull/14164)) [@karthikeyann](https://github.com/karthikeyann)
- Fix calls to copy_bitmask to pass stream parameter ([#14158](https://github.com/rapidsai/cudf/pull/14158)) [@davidwendt](https://github.com/davidwendt)
- Fix DataFrame from Series with different CategoricalIndexes ([#14157](https://github.com/rapidsai/cudf/pull/14157)) [@mroeschke](https://github.com/mroeschke)
- Pin to numpy<1.25 and numba<0.58 to avoid errors and deprecation warnings-as-errors. ([#14156](https://github.com/rapidsai/cudf/pull/14156)) [@bdice](https://github.com/bdice)
- Fix kernel launch error for cudf::io::orc::gpu::rowgroup_char_counts_kernel ([#14139](https://github.com/rapidsai/cudf/pull/14139)) [@davidwendt](https://github.com/davidwendt)
- Don't sort columns for DataFrame init from list of Series ([#14136](https://github.com/rapidsai/cudf/pull/14136)) [@mroeschke](https://github.com/mroeschke)
- Fix DataFrame.values with no columns but index ([#14134](https://github.com/rapidsai/cudf/pull/14134)) [@mroeschke](https://github.com/mroeschke)
- Avoid circular cimports in _lib/cpp/reduce.pxd ([#14125](https://github.com/rapidsai/cudf/pull/14125)) [@vyasr](https://github.com/vyasr)
- Add support for nested dict in `DataFrame` constructor ([#14119](https://github.com/rapidsai/cudf/pull/14119)) [@galipremsagar](https://github.com/galipremsagar)
- Restrict iterables of `DataFrame`'s as input to `DataFrame` constructor ([#14118](https://github.com/rapidsai/cudf/pull/14118)) [@galipremsagar](https://github.com/galipremsagar)
- Allow `numeric_only=True` for reduction operations on numeric types ([#14111](https://github.com/rapidsai/cudf/pull/14111)) [@galipremsagar](https://github.com/galipremsagar)
- Preserve name of the column while initializing a `DataFrame` ([#14110](https://github.com/rapidsai/cudf/pull/14110)) [@galipremsagar](https://github.com/galipremsagar)
- Correct numerous 20054-D: dynamic initialization errors found on arm+12.2 ([#14108](https://github.com/rapidsai/cudf/pull/14108)) [@robertmaynard](https://github.com/robertmaynard)
- Drop `kwargs` from `Series.count` ([#14106](https://github.com/rapidsai/cudf/pull/14106)) [@galipremsagar](https://github.com/galipremsagar)
- Fix naming issues with `Index.to_frame` and `MultiIndex.to_frame` APIs ([#14105](https://github.com/rapidsai/cudf/pull/14105)) [@galipremsagar](https://github.com/galipremsagar)
- Only use memory resources that haven't been freed ([#14103](https://github.com/rapidsai/cudf/pull/14103)) [@robertmaynard](https://github.com/robertmaynard)
- Add support for `__round__` in `Series` and `DataFrame` ([#14099](https://github.com/rapidsai/cudf/pull/14099)) [@galipremsagar](https://github.com/galipremsagar)
- Validate ignore_index type in drop_duplicates ([#14098](https://github.com/rapidsai/cudf/pull/14098)) [@mroeschke](https://github.com/mroeschke)
- Fix renaming `Series` and `Index` ([#14080](https://github.com/rapidsai/cudf/pull/14080)) [@galipremsagar](https://github.com/galipremsagar)
- Raise NotImplementedError in to_datetime if Z (or tz component) in string ([#14074](https://github.com/rapidsai/cudf/pull/14074)) [@mroeschke](https://github.com/mroeschke)
- Raise NotImplementedError for datetime strings with UTC offset ([#14070](https://github.com/rapidsai/cudf/pull/14070)) [@mroeschke](https://github.com/mroeschke)
- Update pyarrow-related dispatch logic in dask_cudf ([#14069](https://github.com/rapidsai/cudf/pull/14069)) [@rjzamora](https://github.com/rjzamora)
- Use `conda mambabuild` rather than `mamba mambabuild` ([#14067](https://github.com/rapidsai/cudf/pull/14067)) [@wence-](https://github.com/wence-)
- Raise NotImplementedError in to_datetime with dayfirst without infer_format ([#14058](https://github.com/rapidsai/cudf/pull/14058)) [@mroeschke](https://github.com/mroeschke)
- Fix various issues in `Index.intersection` ([#14054](https://github.com/rapidsai/cudf/pull/14054)) [@galipremsagar](https://github.com/galipremsagar)
- Fix `Index.difference` to match with pandas ([#14053](https://github.com/rapidsai/cudf/pull/14053)) [@galipremsagar](https://github.com/galipremsagar)
- Fix empty string column construction ([#14052](https://github.com/rapidsai/cudf/pull/14052)) [@galipremsagar](https://github.com/galipremsagar)
- Fix `IntervalIndex.union` to preserve type-metadata ([#14051](https://github.com/rapidsai/cudf/pull/14051)) [@galipremsagar](https://github.com/galipremsagar)
- Raise `MixedTypeError` when a column of mixed-dtype is being constructed ([#14050](https://github.com/rapidsai/cudf/pull/14050)) [@galipremsagar](https://github.com/galipremsagar)
- Raise `NotImplementedError` for `MultiIndex.to_series` ([#14049](https://github.com/rapidsai/cudf/pull/14049)) [@galipremsagar](https://github.com/galipremsagar)
- Ignore compile_commands.json ([#14048](https://github.com/rapidsai/cudf/pull/14048)) [@harrism](https://github.com/harrism)
- Raise TypeError for any non-parseable argument in to_datetime ([#14044](https://github.com/rapidsai/cudf/pull/14044)) [@mroeschke](https://github.com/mroeschke)
- Raise NotImplementedError for to_datetime with z format ([#14037](https://github.com/rapidsai/cudf/pull/14037)) [@mroeschke](https://github.com/mroeschke)
- Implement `sort_remaining` for `sort_index` ([#14033](https://github.com/rapidsai/cudf/pull/14033)) [@wence-](https://github.com/wence-)
- Raise NotImplementedError for Categoricals with timezones ([#14032](https://github.com/rapidsai/cudf/pull/14032)) [@mroeschke](https://github.com/mroeschke)
- Temporary fix Parquet metadata with empty value string being ignored from writing ([#14026](https://github.com/rapidsai/cudf/pull/14026)) [@ttnghia](https://github.com/ttnghia)
- Preserve types of scalar being returned when possible in `quantile` ([#14014](https://github.com/rapidsai/cudf/pull/14014)) [@galipremsagar](https://github.com/galipremsagar)
- Fix return type of `MultiIndex.difference` ([#14009](https://github.com/rapidsai/cudf/pull/14009)) [@galipremsagar](https://github.com/galipremsagar)
- Raise an error when timezone subtypes are encountered in `pd.IntervalDtype` ([#14006](https://github.com/rapidsai/cudf/pull/14006)) [@galipremsagar](https://github.com/galipremsagar)
- Fix map column can not be non-nullable for java ([#14003](https://github.com/rapidsai/cudf/pull/14003)) [@res-life](https://github.com/res-life)
- Fix `name` selection in `Index.difference` and `Index.intersection` ([#13986](https://github.com/rapidsai/cudf/pull/13986)) [@galipremsagar](https://github.com/galipremsagar)
- Restore column type metadata with `dropna` to fix `factorize` API ([#13980](https://github.com/rapidsai/cudf/pull/13980)) [@galipremsagar](https://github.com/galipremsagar)
- Use thread_index_type to avoid out of bounds accesses in conditional joins ([#13971](https://github.com/rapidsai/cudf/pull/13971)) [@vyasr](https://github.com/vyasr)
- Fix `MultiIndex.to_numpy` to return numpy array with tuples ([#13966](https://github.com/rapidsai/cudf/pull/13966)) [@galipremsagar](https://github.com/galipremsagar)
- Use cudf::thread_index_type in get_json_object and tdigest kernels ([#13962](https://github.com/rapidsai/cudf/pull/13962)) [@nvdbaranec](https://github.com/nvdbaranec)
- Fix an issue with `IntervalIndex.repr` when null values are present ([#13958](https://github.com/rapidsai/cudf/pull/13958)) [@galipremsagar](https://github.com/galipremsagar)
- Fix type metadata issue preservation with `Column.unique` ([#13957](https://github.com/rapidsai/cudf/pull/13957)) [@galipremsagar](https://github.com/galipremsagar)
- Handle `Interval` scalars when passed in list-like inputs to `cudf.Index` ([#13956](https://github.com/rapidsai/cudf/pull/13956)) [@galipremsagar](https://github.com/galipremsagar)
- Fix setting of categories order when `dtype` is passed to a `CategoricalColumn` ([#13955](https://github.com/rapidsai/cudf/pull/13955)) [@galipremsagar](https://github.com/galipremsagar)
- Handle `as_index` in `GroupBy.apply` ([#13951](https://github.com/rapidsai/cudf/pull/13951)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Raise error for string types in `nsmallest` and `nlargest` ([#13946](https://github.com/rapidsai/cudf/pull/13946)) [@galipremsagar](https://github.com/galipremsagar)
- Fix `index` of `Groupby.apply` results when it is performed on empty objects ([#13944](https://github.com/rapidsai/cudf/pull/13944)) [@galipremsagar](https://github.com/galipremsagar)
- Fix integer overflow in shim `device_sum` functions ([#13943](https://github.com/rapidsai/cudf/pull/13943)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Fix type mismatch in groupby reduction for empty objects ([#13942](https://github.com/rapidsai/cudf/pull/13942)) [@galipremsagar](https://github.com/galipremsagar)
- Fixed processed bytes calculation in APPLY_BOOLEAN_MASK benchmark. ([#13937](https://github.com/rapidsai/cudf/pull/13937)) [@Blonck](https://github.com/Blonck)
- Fix construction of `Grouping` objects ([#13932](https://github.com/rapidsai/cudf/pull/13932)) [@galipremsagar](https://github.com/galipremsagar)
- Fix an issue with `loc` when column names is `MultiIndex` ([#13929](https://github.com/rapidsai/cudf/pull/13929)) [@galipremsagar](https://github.com/galipremsagar)
- Fix handling of typecasting in `searchsorted` ([#13925](https://github.com/rapidsai/cudf/pull/13925)) [@galipremsagar](https://github.com/galipremsagar)
- Preserve index `name` in `reindex` ([#13917](https://github.com/rapidsai/cudf/pull/13917)) [@galipremsagar](https://github.com/galipremsagar)
- Use `cudf::thread_index_type` in cuIO to prevent overflow in row indexing ([#13910](https://github.com/rapidsai/cudf/pull/13910)) [@vuule](https://github.com/vuule)
- Fix for encodings listed in the Parquet column chunk metadata ([#13907](https://github.com/rapidsai/cudf/pull/13907)) [@etseidl](https://github.com/etseidl)
- Use cudf::thread_index_type in concatenate.cu. ([#13906](https://github.com/rapidsai/cudf/pull/13906)) [@bdice](https://github.com/bdice)
- Use cudf::thread_index_type in replace.cu. ([#13905](https://github.com/rapidsai/cudf/pull/13905)) [@bdice](https://github.com/bdice)
- Add noSanitizer tag to Java reduction tests failing with sanitizer in CUDA 12 ([#13904](https://github.com/rapidsai/cudf/pull/13904)) [@jlowe](https://github.com/jlowe)
- Remove the internal use of the cudf's default stream in cuIO ([#13903](https://github.com/rapidsai/cudf/pull/13903)) [@vuule](https://github.com/vuule)
- Use cuda-nvtx-dev CUDA 12 package. ([#13901](https://github.com/rapidsai/cudf/pull/13901)) [@bdice](https://github.com/bdice)
- Use `thread_index_type` to avoid index overflow in grid-stride loops ([#13895](https://github.com/rapidsai/cudf/pull/13895)) [@PointKernel](https://github.com/PointKernel)
- Fix memory access error in cudf::shift for sliced strings ([#13894](https://github.com/rapidsai/cudf/pull/13894)) [@davidwendt](https://github.com/davidwendt)
- Raise error when trying to construct a `DataFrame` with mixed types ([#13889](https://github.com/rapidsai/cudf/pull/13889)) [@galipremsagar](https://github.com/galipremsagar)
- Return `nan` when one variable to be correlated has zero variance in JIT GroupBy Apply ([#13884](https://github.com/rapidsai/cudf/pull/13884)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Correctly detect the BOM mark in `read_csv` with compressed input ([#13881](https://github.com/rapidsai/cudf/pull/13881)) [@vuule](https://github.com/vuule)
- Check for the presence of all values in `MultiIndex.isin` ([#13879](https://github.com/rapidsai/cudf/pull/13879)) [@galipremsagar](https://github.com/galipremsagar)
- Fix nvtext::generate_character_ngrams performance regression for longer strings ([#13874](https://github.com/rapidsai/cudf/pull/13874)) [@davidwendt](https://github.com/davidwendt)
- Fix return type of `MultiIndex.levels` ([#13870](https://github.com/rapidsai/cudf/pull/13870)) [@galipremsagar](https://github.com/galipremsagar)
- Fix List's missing children metadata in JSON writer ([#13869](https://github.com/rapidsai/cudf/pull/13869)) [@karthikeyann](https://github.com/karthikeyann)
- Disable construction of Index when `freq` is set in pandas-compatibility mode ([#13857](https://github.com/rapidsai/cudf/pull/13857)) [@galipremsagar](https://github.com/galipremsagar)
- Fix an issue with fetching `NA` from a `TimedeltaColumn` ([#13853](https://github.com/rapidsai/cudf/pull/13853)) [@galipremsagar](https://github.com/galipremsagar)
- Simplify implementation of interval_range() and fix behaviour for floating `freq` ([#13844](https://github.com/rapidsai/cudf/pull/13844)) [@shwina](https://github.com/shwina)
- Fix binary operations between `Series` and `Index` ([#13842](https://github.com/rapidsai/cudf/pull/13842)) [@galipremsagar](https://github.com/galipremsagar)
- Update make_lists_column_from_scalar to use make_offsets_child_column utility ([#13841](https://github.com/rapidsai/cudf/pull/13841)) [@davidwendt](https://github.com/davidwendt)
- Fix read out of bounds in string concatenate ([#13838](https://github.com/rapidsai/cudf/pull/13838)) [@pentschev](https://github.com/pentschev)
- Raise error for more cases when `timezone-aware` data is passed to `as_column` ([#13835](https://github.com/rapidsai/cudf/pull/13835)) [@galipremsagar](https://github.com/galipremsagar)
- Fix `any`, `all` reduction behavior for `axis=None` and warn for other reductions ([#13831](https://github.com/rapidsai/cudf/pull/13831)) [@galipremsagar](https://github.com/galipremsagar)
- Raise error when trying to construct time-zone aware timestamps ([#13830](https://github.com/rapidsai/cudf/pull/13830)) [@galipremsagar](https://github.com/galipremsagar)
- Fix cuFile I/O factories ([#13829](https://github.com/rapidsai/cudf/pull/13829)) [@vuule](https://github.com/vuule)
- DataFrame with namedtuples uses ._field as column names ([#13824](https://github.com/rapidsai/cudf/pull/13824)) [@mroeschke](https://github.com/mroeschke)
- Branch 23.10 merge 23.08 ([#13822](https://github.com/rapidsai/cudf/pull/13822)) [@vyasr](https://github.com/vyasr)
- Return a Series from JIT GroupBy apply, rather than a DataFrame ([#13820](https://github.com/rapidsai/cudf/pull/13820)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- No need to dlsym EnsureS3Finalized we can call it directly ([#13819](https://github.com/rapidsai/cudf/pull/13819)) [@robertmaynard](https://github.com/robertmaynard)
- Raise error when mixed types are being constructed ([#13816](https://github.com/rapidsai/cudf/pull/13816)) [@galipremsagar](https://github.com/galipremsagar)
- Fix unbounded sequence issue in `DataFrame` constructor ([#13811](https://github.com/rapidsai/cudf/pull/13811)) [@galipremsagar](https://github.com/galipremsagar)
- Fix Byte-Pair-Encoding usage of cuco static-map for storing merge-pairs ([#13807](https://github.com/rapidsai/cudf/pull/13807)) [@davidwendt](https://github.com/davidwendt)
- Fix for Parquet writer when requested pages per row is smaller than fragment size ([#13806](https://github.com/rapidsai/cudf/pull/13806)) [@etseidl](https://github.com/etseidl)
- Remove hangs from trying to construct un-bounded sequences ([#13799](https://github.com/rapidsai/cudf/pull/13799)) [@galipremsagar](https://github.com/galipremsagar)
- Bug/update libcudf to handle arrow12 changes ([#13794](https://github.com/rapidsai/cudf/pull/13794)) [@robertmaynard](https://github.com/robertmaynard)
- Update get_arrow to arrows 12 CMake target name of arrow::xsimd ([#13790](https://github.com/rapidsai/cudf/pull/13790)) [@robertmaynard](https://github.com/robertmaynard)
- Raise error when trying to join `datetime` and `timedelta` types with other types ([#13786](https://github.com/rapidsai/cudf/pull/13786)) [@galipremsagar](https://github.com/galipremsagar)
- Fix negative unary operation for boolean type ([#13780](https://github.com/rapidsai/cudf/pull/13780)) [@galipremsagar](https://github.com/galipremsagar)
- Fix contains(`in`) method for `Series` ([#13779](https://github.com/rapidsai/cudf/pull/13779)) [@galipremsagar](https://github.com/galipremsagar)
- Fix binary operation column ordering and missing column issues ([#13778](https://github.com/rapidsai/cudf/pull/13778)) [@galipremsagar](https://github.com/galipremsagar)
- Cast only time of day to nanos to avoid an overflow in Parquet INT96 write ([#13776](https://github.com/rapidsai/cudf/pull/13776)) [@gerashegalov](https://github.com/gerashegalov)
- Preserve names of column object in various APIs ([#13772](https://github.com/rapidsai/cudf/pull/13772)) [@galipremsagar](https://github.com/galipremsagar)
- Raise error on constructing an array from mixed type inputs ([#13768](https://github.com/rapidsai/cudf/pull/13768)) [@galipremsagar](https://github.com/galipremsagar)
- Fix construction of DataFrames from dict when columns are provided ([#13766](https://github.com/rapidsai/cudf/pull/13766)) [@wence-](https://github.com/wence-)
- Provide our own Cython declaration for make_unique ([#13746](https://github.com/rapidsai/cudf/pull/13746)) [@wence-](https://github.com/wence-)
## 📖 Documentation
- Fix typo in docstring: metadata. ([#14025](https://github.com/rapidsai/cudf/pull/14025)) [@bdice](https://github.com/bdice)
- Fix typo in parquet/page_decode.cuh ([#13849](https://github.com/rapidsai/cudf/pull/13849)) [@XinyuZeng](https://github.com/XinyuZeng)
- Simplify Python doc configuration ([#13826](https://github.com/rapidsai/cudf/pull/13826)) [@vyasr](https://github.com/vyasr)
- Update documentation to reflect recent changes in JSON reader and writer ([#13791](https://github.com/rapidsai/cudf/pull/13791)) [@vuule](https://github.com/vuule)
- Fix all warnings in Python docs ([#13789](https://github.com/rapidsai/cudf/pull/13789)) [@vyasr](https://github.com/vyasr)
## 🚀 New Features
- [Java] Add JNI bindings for `integers_to_hex` ([#14205](https://github.com/rapidsai/cudf/pull/14205)) [@razajafri](https://github.com/razajafri)
- Propagate errors from Parquet reader kernels back to host ([#14167](https://github.com/rapidsai/cudf/pull/14167)) [@vuule](https://github.com/vuule)
- JNI for `HISTOGRAM` and `MERGE_HISTOGRAM` aggregations ([#14154](https://github.com/rapidsai/cudf/pull/14154)) [@ttnghia](https://github.com/ttnghia)
- Expose streams in all public sorting APIs ([#14146](https://github.com/rapidsai/cudf/pull/14146)) [@vyasr](https://github.com/vyasr)
- Enable direct ingestion and production of Arrow scalars ([#14121](https://github.com/rapidsai/cudf/pull/14121)) [@vyasr](https://github.com/vyasr)
- Implement `GroupBy.value_counts` to match pandas API ([#14114](https://github.com/rapidsai/cudf/pull/14114)) [@stmio](https://github.com/stmio)
- Refactor parquet thrift reader ([#14097](https://github.com/rapidsai/cudf/pull/14097)) [@etseidl](https://github.com/etseidl)
- Refactor `hash_reduce_by_row` ([#14095](https://github.com/rapidsai/cudf/pull/14095)) [@ttnghia](https://github.com/ttnghia)
- Support negative preceding/following for ROW window functions ([#14093](https://github.com/rapidsai/cudf/pull/14093)) [@mythrocks](https://github.com/mythrocks)
- Support for progressive parquet chunked reading. ([#14079](https://github.com/rapidsai/cudf/pull/14079)) [@nvdbaranec](https://github.com/nvdbaranec)
- Implement `HISTOGRAM` and `MERGE_HISTOGRAM` aggregations ([#14045](https://github.com/rapidsai/cudf/pull/14045)) [@ttnghia](https://github.com/ttnghia)
- Expose streams in public search APIs ([#14034](https://github.com/rapidsai/cudf/pull/14034)) [@vyasr](https://github.com/vyasr)
- Expose streams in public replace APIs ([#14010](https://github.com/rapidsai/cudf/pull/14010)) [@vyasr](https://github.com/vyasr)
- Add stream parameter to public cudf::strings::split APIs ([#13997](https://github.com/rapidsai/cudf/pull/13997)) [@davidwendt](https://github.com/davidwendt)
- Expose streams in public filling APIs ([#13990](https://github.com/rapidsai/cudf/pull/13990)) [@vyasr](https://github.com/vyasr)
- Expose streams in public concatenate APIs ([#13987](https://github.com/rapidsai/cudf/pull/13987)) [@vyasr](https://github.com/vyasr)
- Use HostMemoryAllocator in jni::allocate_host_buffer ([#13975](https://github.com/rapidsai/cudf/pull/13975)) [@gerashegalov](https://github.com/gerashegalov)
- Enable fractional null probability for hashing benchmark ([#13967](https://github.com/rapidsai/cudf/pull/13967)) [@Blonck](https://github.com/Blonck)
- Switch pylibcudf-enabled types to use enum class in Cython ([#13931](https://github.com/rapidsai/cudf/pull/13931)) [@vyasr](https://github.com/vyasr)
- Add nvtext::tokenize_with_vocabulary API ([#13930](https://github.com/rapidsai/cudf/pull/13930)) [@davidwendt](https://github.com/davidwendt)
- Rewrite `DataFrame.stack` to support multi level column names ([#13927](https://github.com/rapidsai/cudf/pull/13927)) [@isVoid](https://github.com/isVoid)
- Add HostMemoryAllocator interface ([#13924](https://github.com/rapidsai/cudf/pull/13924)) [@gerashegalov](https://github.com/gerashegalov)
- Global stream pool ([#13922](https://github.com/rapidsai/cudf/pull/13922)) [@etseidl](https://github.com/etseidl)
- Create table_input_metadata from a table_metadata ([#13920](https://github.com/rapidsai/cudf/pull/13920)) [@etseidl](https://github.com/etseidl)
- Translate column size overflow exception to JNI ([#13911](https://github.com/rapidsai/cudf/pull/13911)) [@mythrocks](https://github.com/mythrocks)
- Enable RLE boolean encoding for v2 Parquet files ([#13886](https://github.com/rapidsai/cudf/pull/13886)) [@etseidl](https://github.com/etseidl)
- Exclude some tests from running with the compute sanitizer ([#13872](https://github.com/rapidsai/cudf/pull/13872)) [@firestarman](https://github.com/firestarman)
- Expand statistics support in ORC writer ([#13848](https://github.com/rapidsai/cudf/pull/13848)) [@vuule](https://github.com/vuule)
- Register the memory mapped buffer in `datasource` to improve H2D throughput ([#13814](https://github.com/rapidsai/cudf/pull/13814)) [@vuule](https://github.com/vuule)
- Add cudf::strings::find function with target per row ([#13808](https://github.com/rapidsai/cudf/pull/13808)) [@davidwendt](https://github.com/davidwendt)
- Add minhash support for MurmurHash3_x64_128 ([#13796](https://github.com/rapidsai/cudf/pull/13796)) [@davidwendt](https://github.com/davidwendt)
- Remove unnecessary pointer copying in JIT GroupBy Apply ([#13792](https://github.com/rapidsai/cudf/pull/13792)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Add 'poll' function to custreamz kafka consumer ([#13782](https://github.com/rapidsai/cudf/pull/13782)) [@jdye64](https://github.com/jdye64)
- Support `corr` in `GroupBy.apply` through the jit engine ([#13767](https://github.com/rapidsai/cudf/pull/13767)) [@shwina](https://github.com/shwina)
- Optionally write version 2 page headers in Parquet writer ([#13751](https://github.com/rapidsai/cudf/pull/13751)) [@etseidl](https://github.com/etseidl)
- Support more numeric types in `Groupby.apply` with `engine='jit'` ([#13729](https://github.com/rapidsai/cudf/pull/13729)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- [FEA] Add DELTA_BINARY_PACKED decoding support to Parquet reader ([#13637](https://github.com/rapidsai/cudf/pull/13637)) [@etseidl](https://github.com/etseidl)
- Read FIXED_LEN_BYTE_ARRAY as binary in parquet reader ([#13437](https://github.com/rapidsai/cudf/pull/13437)) [@PointKernel](https://github.com/PointKernel)
## 🛠️ Improvements
- Pin `dask` and `distributed` for `23.10` release ([#14225](https://github.com/rapidsai/cudf/pull/14225)) [@galipremsagar](https://github.com/galipremsagar)
- update rmm tag path ([#14195](https://github.com/rapidsai/cudf/pull/14195)) [@AyodeAwe](https://github.com/AyodeAwe)
- Disable `Recently Updated` Check ([#14193](https://github.com/rapidsai/cudf/pull/14193)) [@ajschmidt8](https://github.com/ajschmidt8)
- Move cpp/src/hash/hash_allocator.cuh to include/cudf/hashing/detail ([#14163](https://github.com/rapidsai/cudf/pull/14163)) [@davidwendt](https://github.com/davidwendt)
- Add Parquet reader benchmarks for row selection ([#14147](https://github.com/rapidsai/cudf/pull/14147)) [@vuule](https://github.com/vuule)
- Update image names ([#14145](https://github.com/rapidsai/cudf/pull/14145)) [@AyodeAwe](https://github.com/AyodeAwe)
- Support callables in DataFrame.assign ([#14142](https://github.com/rapidsai/cudf/pull/14142)) [@wence-](https://github.com/wence-)
- Reduce memory usage of as_categorical_column ([#14138](https://github.com/rapidsai/cudf/pull/14138)) [@wence-](https://github.com/wence-)
- Replace Python scalar conversions with libcudf ([#14124](https://github.com/rapidsai/cudf/pull/14124)) [@vyasr](https://github.com/vyasr)
- Update to clang 16.0.6. ([#14120](https://github.com/rapidsai/cudf/pull/14120)) [@bdice](https://github.com/bdice)
- Fix type of empty `Index` and raise warning in `Series` constructor ([#14116](https://github.com/rapidsai/cudf/pull/14116)) [@galipremsagar](https://github.com/galipremsagar)
- Add stream parameter to external dict APIs ([#14115](https://github.com/rapidsai/cudf/pull/14115)) [@SurajAralihalli](https://github.com/SurajAralihalli)
- Add fallback matrix for nvcomp. ([#14082](https://github.com/rapidsai/cudf/pull/14082)) [@bdice](https://github.com/bdice)
- [Java] Add recoverWithNull to JSONOptions and pass to Table.readJSON ([#14078](https://github.com/rapidsai/cudf/pull/14078)) [@andygrove](https://github.com/andygrove)
- Remove header tests ([#14072](https://github.com/rapidsai/cudf/pull/14072)) [@ajschmidt8](https://github.com/ajschmidt8)
- Refactor `contains_table` with cuco::static_set ([#14064](https://github.com/rapidsai/cudf/pull/14064)) [@PointKernel](https://github.com/PointKernel)
- Remove debug print in a Parquet test ([#14063](https://github.com/rapidsai/cudf/pull/14063)) [@vuule](https://github.com/vuule)
- Expose stream parameter in public nvtext ngram APIs ([#14061](https://github.com/rapidsai/cudf/pull/14061)) [@davidwendt](https://github.com/davidwendt)
- Expose stream parameter in public strings find APIs ([#14060](https://github.com/rapidsai/cudf/pull/14060)) [@davidwendt](https://github.com/davidwendt)
- Update doxygen to 1.9.1 ([#14059](https://github.com/rapidsai/cudf/pull/14059)) [@vyasr](https://github.com/vyasr)
- Remove the mr from the base fixture ([#14057](https://github.com/rapidsai/cudf/pull/14057)) [@vyasr](https://github.com/vyasr)
- Expose streams in public strings case APIs ([#14056](https://github.com/rapidsai/cudf/pull/14056)) [@davidwendt](https://github.com/davidwendt)
- Refactor libcudf indexalator to typed normalator ([#14043](https://github.com/rapidsai/cudf/pull/14043)) [@davidwendt](https://github.com/davidwendt)
- Use cudf::make_empty_column instead of column_view constructor ([#14030](https://github.com/rapidsai/cudf/pull/14030)) [@davidwendt](https://github.com/davidwendt)
- Remove quadratic runtime due to accessing Frame._dtypes in loop ([#14028](https://github.com/rapidsai/cudf/pull/14028)) [@wence-](https://github.com/wence-)
- Explicitly depend on zlib in conda recipes ([#14018](https://github.com/rapidsai/cudf/pull/14018)) [@wence-](https://github.com/wence-)
- Use grid_stride for stride computations. ([#13996](https://github.com/rapidsai/cudf/pull/13996)) [@bdice](https://github.com/bdice)
- Fix an issue where casting null-array to `object` dtype will result in a failure ([#13994](https://github.com/rapidsai/cudf/pull/13994)) [@galipremsagar](https://github.com/galipremsagar)
- Add tab as literal to cudf::test::to_string output ([#13993](https://github.com/rapidsai/cudf/pull/13993)) [@davidwendt](https://github.com/davidwendt)
- Enable `codes` dtype parity in pandas-compatibility mode for `factorize` API ([#13982](https://github.com/rapidsai/cudf/pull/13982)) [@galipremsagar](https://github.com/galipremsagar)
- Fix `CategoricalIndex` ordering in `Groupby.agg` when pandas-compatibility mode is enabled ([#13978](https://github.com/rapidsai/cudf/pull/13978)) [@galipremsagar](https://github.com/galipremsagar)
- Produce a fatal error if cudf is unable to find pyarrow include directory ([#13976](https://github.com/rapidsai/cudf/pull/13976)) [@cwharris](https://github.com/cwharris)
- Use `thread_index_type` in `partitioning.cu` ([#13973](https://github.com/rapidsai/cudf/pull/13973)) [@divyegala](https://github.com/divyegala)
- Use `cudf::thread_index_type` in `merge.cu` ([#13972](https://github.com/rapidsai/cudf/pull/13972)) [@divyegala](https://github.com/divyegala)
- Use `copy-pr-bot` ([#13970](https://github.com/rapidsai/cudf/pull/13970)) [@ajschmidt8](https://github.com/ajschmidt8)
- Use cudf::thread_index_type in strings custom kernels ([#13968](https://github.com/rapidsai/cudf/pull/13968)) [@davidwendt](https://github.com/davidwendt)
- Add `bytes_per_second` to hash_partition benchmark ([#13965](https://github.com/rapidsai/cudf/pull/13965)) [@Blonck](https://github.com/Blonck)
- Added pinned pool reservation API for java ([#13964](https://github.com/rapidsai/cudf/pull/13964)) [@revans2](https://github.com/revans2)
- Simplify wheel build scripts and allow alphas of RAPIDS dependencies ([#13963](https://github.com/rapidsai/cudf/pull/13963)) [@vyasr](https://github.com/vyasr)
- Add `bytes_per_second` to copy_if_else benchmark ([#13960](https://github.com/rapidsai/cudf/pull/13960)) [@Blonck](https://github.com/Blonck)
- Add pandas compatible output to `Series.unique` ([#13959](https://github.com/rapidsai/cudf/pull/13959)) [@galipremsagar](https://github.com/galipremsagar)
- Add `bytes_per_second` to compiled binaryop benchmark ([#13938](https://github.com/rapidsai/cudf/pull/13938)) [@Blonck](https://github.com/Blonck)
- Unpin `dask` and `distributed` for `23.10` development ([#13935](https://github.com/rapidsai/cudf/pull/13935)) [@galipremsagar](https://github.com/galipremsagar)
- Make HostColumnVector.getRefCount public ([#13934](https://github.com/rapidsai/cudf/pull/13934)) [@abellina](https://github.com/abellina)
- Use cuco::static_set in JSON tree algorithm ([#13928](https://github.com/rapidsai/cudf/pull/13928)) [@karthikeyann](https://github.com/karthikeyann)
- Add java API to get size of host memory needed to copy column view ([#13919](https://github.com/rapidsai/cudf/pull/13919)) [@revans2](https://github.com/revans2)
- Use cudf::size_type instead of int32 where appropriate in nvtext functions ([#13915](https://github.com/rapidsai/cudf/pull/13915)) [@davidwendt](https://github.com/davidwendt)
- Enable hugepage for arrow host allocations ([#13914](https://github.com/rapidsai/cudf/pull/13914)) [@madsbk](https://github.com/madsbk)
- Improve performance of nvtext::edit_distance ([#13912](https://github.com/rapidsai/cudf/pull/13912)) [@davidwendt](https://github.com/davidwendt)
- Ensure cudf internals use pylibcudf in pure Python mode ([#13909](https://github.com/rapidsai/cudf/pull/13909)) [@vyasr](https://github.com/vyasr)
- Use `empty()` instead of `size()` where possible ([#13908](https://github.com/rapidsai/cudf/pull/13908)) [@vuule](https://github.com/vuule)
- [JNI] Adds HostColumnVector.EventHandler for spillability checks ([#13898](https://github.com/rapidsai/cudf/pull/13898)) [@abellina](https://github.com/abellina)
- Return `Timestamp` & `Timedelta` for fetching scalars in `DatetimeIndex` & `TimedeltaIndex` ([#13896](https://github.com/rapidsai/cudf/pull/13896)) [@galipremsagar](https://github.com/galipremsagar)
- Allow explicit `shuffle="p2p"` within dask-cudf API ([#13893](https://github.com/rapidsai/cudf/pull/13893)) [@rjzamora](https://github.com/rjzamora)
- Disable creation of `DatetimeIndex` when `freq` is passed to `cudf.date_range` ([#13890](https://github.com/rapidsai/cudf/pull/13890)) [@galipremsagar](https://github.com/galipremsagar)
- Bring parity with pandas for `datetime` & `timedelta` comparison operations ([#13877](https://github.com/rapidsai/cudf/pull/13877)) [@galipremsagar](https://github.com/galipremsagar)
- Change `NA` to `NaT` for `datetime` and `timedelta` types ([#13868](https://github.com/rapidsai/cudf/pull/13868)) [@galipremsagar](https://github.com/galipremsagar)
- Raise error when `astype(object)` is called in pandas compatibility mode ([#13862](https://github.com/rapidsai/cudf/pull/13862)) [@galipremsagar](https://github.com/galipremsagar)
- Fixes a performance regression in FST ([#13850](https://github.com/rapidsai/cudf/pull/13850)) [@elstehle](https://github.com/elstehle)
- Set native handles to null on close in Java wrapper classes ([#13818](https://github.com/rapidsai/cudf/pull/13818)) [@jlowe](https://github.com/jlowe)
- Avoid use of CUDF_EXPECTS in libcudf unit tests outside of helper functions with return values ([#13812](https://github.com/rapidsai/cudf/pull/13812)) [@vuule](https://github.com/vuule)
- Update `lists::contains` to experimental row comparator ([#13810](https://github.com/rapidsai/cudf/pull/13810)) [@divyegala](https://github.com/divyegala)
- Reduce `lists::contains` dispatches for scalars ([#13805](https://github.com/rapidsai/cudf/pull/13805)) [@divyegala](https://github.com/divyegala)
- Long string optimization for string column parsing in JSON reader ([#13803](https://github.com/rapidsai/cudf/pull/13803)) [@karthikeyann](https://github.com/karthikeyann)
- Raise NotImplementedError for pd.SparseDtype ([#13798](https://github.com/rapidsai/cudf/pull/13798)) [@mroeschke](https://github.com/mroeschke)
- Remove the libcudf cudf::offset_type type ([#13788](https://github.com/rapidsai/cudf/pull/13788)) [@davidwendt](https://github.com/davidwendt)
- Move Spark-indpendent Table debug to cudf Java ([#13783](https://github.com/rapidsai/cudf/pull/13783)) [@gerashegalov](https://github.com/gerashegalov)
- Update to Cython 3.0.0 ([#13777](https://github.com/rapidsai/cudf/pull/13777)) [@vyasr](https://github.com/vyasr)
- Refactor Parquet reader handling of V2 page header info ([#13775](https://github.com/rapidsai/cudf/pull/13775)) [@etseidl](https://github.com/etseidl)
- Branch 23.10 merge 23.08 ([#13773](https://github.com/rapidsai/cudf/pull/13773)) [@vyasr](https://github.com/vyasr)
- Restructure JSON code to correctly reflect legacy/experimental status ([#13757](https://github.com/rapidsai/cudf/pull/13757)) [@vuule](https://github.com/vuule)
- Branch 23.10 merge 23.08 ([#13753](https://github.com/rapidsai/cudf/pull/13753)) [@vyasr](https://github.com/vyasr)
- Enforce deprecations in `23.10` ([#13732](https://github.com/rapidsai/cudf/pull/13732)) [@galipremsagar](https://github.com/galipremsagar)
- Upgrade to arrow 12 ([#13728](https://github.com/rapidsai/cudf/pull/13728)) [@galipremsagar](https://github.com/galipremsagar)
- Refactors JSON reader's pushdown automaton ([#13716](https://github.com/rapidsai/cudf/pull/13716)) [@elstehle](https://github.com/elstehle)
- Remove Arrow dependency from the `datasource.hpp` public header ([#13698](https://github.com/rapidsai/cudf/pull/13698)) [@vuule](https://github.com/vuule)
# cuDF 23.08.00 (9 Aug 2023)
## 🚨 Breaking Changes
- Enforce deprecations and add clarifications around existing deprecations ([#13710](https://github.com/rapidsai/cudf/pull/13710)) [@galipremsagar](https://github.com/galipremsagar)
- Separate MurmurHash32 from hash_functions.cuh ([#13681](https://github.com/rapidsai/cudf/pull/13681)) [@davidwendt](https://github.com/davidwendt)
- Avoid storing metadata in pointers in ORC and Parquet writers ([#13648](https://github.com/rapidsai/cudf/pull/13648)) [@vuule](https://github.com/vuule)
- Expose streams in all public copying APIs ([#13629](https://github.com/rapidsai/cudf/pull/13629)) [@vyasr](https://github.com/vyasr)
- Remove deprecated cudf::strings::slice_strings (by delimiter) functions ([#13628](https://github.com/rapidsai/cudf/pull/13628)) [@davidwendt](https://github.com/davidwendt)
- Remove deprecated cudf.set_allocator. ([#13591](https://github.com/rapidsai/cudf/pull/13591)) [@bdice](https://github.com/bdice)
- Change build.sh to use pip install instead of setup.py ([#13507](https://github.com/rapidsai/cudf/pull/13507)) [@vyasr](https://github.com/vyasr)
- Remove unused max_rows_tensor parameter from subword tokenizer ([#13463](https://github.com/rapidsai/cudf/pull/13463)) [@davidwendt](https://github.com/davidwendt)
- Fix decimal scale reductions in `_get_decimal_type` ([#13224](https://github.com/rapidsai/cudf/pull/13224)) [@charlesbluca](https://github.com/charlesbluca)
## 🐛 Bug Fixes
- Add CUDA version to cudf_kafka and libcudf-example build strings. ([#13769](https://github.com/rapidsai/cudf/pull/13769)) [@bdice](https://github.com/bdice)
- Fix typo in wheels-test.yaml. ([#13763](https://github.com/rapidsai/cudf/pull/13763)) [@bdice](https://github.com/bdice)
- Don't test strings shorter than the requested ngram size ([#13758](https://github.com/rapidsai/cudf/pull/13758)) [@vyasr](https://github.com/vyasr)
- Add CUDA version to custreamz build string. ([#13754](https://github.com/rapidsai/cudf/pull/13754)) [@bdice](https://github.com/bdice)
- Fix writing of ORC files with empty child string columns ([#13745](https://github.com/rapidsai/cudf/pull/13745)) [@vuule](https://github.com/vuule)
- Remove the erroneous "empty level" short-circuit from ORC reader ([#13722](https://github.com/rapidsai/cudf/pull/13722)) [@vuule](https://github.com/vuule)
- Fix character counting when writing sliced tables into ORC ([#13721](https://github.com/rapidsai/cudf/pull/13721)) [@vuule](https://github.com/vuule)
- Parquet uses row group row count if missing from header ([#13712](https://github.com/rapidsai/cudf/pull/13712)) [@hyperbolic2346](https://github.com/hyperbolic2346)
- Fix reading of RLE encoded boolean data from parquet files with V2 page headers ([#13707](https://github.com/rapidsai/cudf/pull/13707)) [@etseidl](https://github.com/etseidl)
- Fix a corner case of list lexicographic comparator ([#13701](https://github.com/rapidsai/cudf/pull/13701)) [@ttnghia](https://github.com/ttnghia)
- Fix combined filtering and column projection in `dask_cudf.read_parquet` ([#13697](https://github.com/rapidsai/cudf/pull/13697)) [@rjzamora](https://github.com/rjzamora)
- Revert fetch-rapids changes ([#13696](https://github.com/rapidsai/cudf/pull/13696)) [@vyasr](https://github.com/vyasr)
- Data generator - include offsets in the size estimate of list elments ([#13688](https://github.com/rapidsai/cudf/pull/13688)) [@vuule](https://github.com/vuule)
- Add `cuda-nvcc-impl` to `cudf` for `numba` CUDA 12 ([#13673](https://github.com/rapidsai/cudf/pull/13673)) [@jakirkham](https://github.com/jakirkham)
- Fix combined filtering and column projection in `read_parquet` ([#13666](https://github.com/rapidsai/cudf/pull/13666)) [@rjzamora](https://github.com/rjzamora)
- Use `thrust::identity` as hash functions for byte pair encoding ([#13665](https://github.com/rapidsai/cudf/pull/13665)) [@PointKernel](https://github.com/PointKernel)
- Fix loc-getitem ordering when index contains duplicate labels ([#13659](https://github.com/rapidsai/cudf/pull/13659)) [@wence-](https://github.com/wence-)
- [REVIEW] Introduce parity with pandas for `MultiIndex.loc` ordering & fix a bug in `Groupby` with `as_index` ([#13657](https://github.com/rapidsai/cudf/pull/13657)) [@galipremsagar](https://github.com/galipremsagar)
- Fix memcheck error found in nvtext tokenize functions ([#13649](https://github.com/rapidsai/cudf/pull/13649)) [@davidwendt](https://github.com/davidwendt)
- Fix `has_nonempty_nulls` ignoring column offset ([#13647](https://github.com/rapidsai/cudf/pull/13647)) [@ttnghia](https://github.com/ttnghia)
- [Java] Avoid double-free corruption in case of an Exception while creating a ColumnView ([#13645](https://github.com/rapidsai/cudf/pull/13645)) [@razajafri](https://github.com/razajafri)
- Fix memcheck error in ORC reader call to cudf::io::copy_uncompressed_kernel ([#13643](https://github.com/rapidsai/cudf/pull/13643)) [@davidwendt](https://github.com/davidwendt)
- Fix CUDA 12 conda environment to remove cubinlinker and ptxcompiler. ([#13636](https://github.com/rapidsai/cudf/pull/13636)) [@bdice](https://github.com/bdice)
- Fix inf/NaN comparisons for FLOAT orderby in window functions ([#13635](https://github.com/rapidsai/cudf/pull/13635)) [@mythrocks](https://github.com/mythrocks)
- Refactor `Index` search to simplify code and increase correctness ([#13625](https://github.com/rapidsai/cudf/pull/13625)) [@wence-](https://github.com/wence-)
- Fix compile warning for unused variable in split_re.cu ([#13621](https://github.com/rapidsai/cudf/pull/13621)) [@davidwendt](https://github.com/davidwendt)
- Fix tz_localize for dask_cudf Series ([#13610](https://github.com/rapidsai/cudf/pull/13610)) [@shwina](https://github.com/shwina)
- Fix issue with no decompressed data in ORC reader ([#13609](https://github.com/rapidsai/cudf/pull/13609)) [@vuule](https://github.com/vuule)
- Fix floating point window range extents. ([#13606](https://github.com/rapidsai/cudf/pull/13606)) [@mythrocks](https://github.com/mythrocks)
- Fix `localize(None)` for timezone-naive columns ([#13603](https://github.com/rapidsai/cudf/pull/13603)) [@shwina](https://github.com/shwina)
- Fixed a memory leak caused by Exception thrown while constructing a ColumnView ([#13597](https://github.com/rapidsai/cudf/pull/13597)) [@razajafri](https://github.com/razajafri)
- Handle nullptr return value from bitmask_or in distinct_count ([#13590](https://github.com/rapidsai/cudf/pull/13590)) [@wence-](https://github.com/wence-)
- Bring parity with pandas in Index.join ([#13589](https://github.com/rapidsai/cudf/pull/13589)) [@galipremsagar](https://github.com/galipremsagar)
- Fix cudf.melt when there are more than 255 columns ([#13588](https://github.com/rapidsai/cudf/pull/13588)) [@hcho3](https://github.com/hcho3)
- Fix memory issues in cuIO due to removal of memory padding ([#13586](https://github.com/rapidsai/cudf/pull/13586)) [@ttnghia](https://github.com/ttnghia)
- Fix Parquet multi-file reading ([#13584](https://github.com/rapidsai/cudf/pull/13584)) [@etseidl](https://github.com/etseidl)
- Fix memcheck error found in LISTS_TEST ([#13579](https://github.com/rapidsai/cudf/pull/13579)) [@davidwendt](https://github.com/davidwendt)
- Fix memcheck error found in STRINGS_TEST ([#13578](https://github.com/rapidsai/cudf/pull/13578)) [@davidwendt](https://github.com/davidwendt)
- Fix memcheck error found in INTEROP_TEST ([#13577](https://github.com/rapidsai/cudf/pull/13577)) [@davidwendt](https://github.com/davidwendt)
- Fix memcheck errors found in REDUCTION_TEST ([#13574](https://github.com/rapidsai/cudf/pull/13574)) [@davidwendt](https://github.com/davidwendt)
- Preemptive fix for hive-partitioning change in dask ([#13564](https://github.com/rapidsai/cudf/pull/13564)) [@rjzamora](https://github.com/rjzamora)
- Fix an issue with `dask_cudf.read_csv` when lines are needed to be skipped ([#13555](https://github.com/rapidsai/cudf/pull/13555)) [@galipremsagar](https://github.com/galipremsagar)
- Fix out-of-bounds memory write in cudf::dictionary::detail::concatenate ([#13554](https://github.com/rapidsai/cudf/pull/13554)) [@davidwendt](https://github.com/davidwendt)
- Fix the null mask size in json reader ([#13537](https://github.com/rapidsai/cudf/pull/13537)) [@karthikeyann](https://github.com/karthikeyann)
- Fix cudf::strings::strip for all-empty input column ([#13533](https://github.com/rapidsai/cudf/pull/13533)) [@davidwendt](https://github.com/davidwendt)
- Make sure to build without isolation or installing dependencies ([#13524](https://github.com/rapidsai/cudf/pull/13524)) [@vyasr](https://github.com/vyasr)
- Remove preload lib from CMake for now ([#13519](https://github.com/rapidsai/cudf/pull/13519)) [@vyasr](https://github.com/vyasr)
- Fix missing separator after null values in JSON writer ([#13503](https://github.com/rapidsai/cudf/pull/13503)) [@karthikeyann](https://github.com/karthikeyann)
- Ensure `single_lane_block_sum_reduce` is safe to call in a loop ([#13488](https://github.com/rapidsai/cudf/pull/13488)) [@wence-](https://github.com/wence-)
- Update all versions in pyproject.toml files. ([#13486](https://github.com/rapidsai/cudf/pull/13486)) [@bdice](https://github.com/bdice)
- Remove applying nvbench that doesn't exist in 23.08 ([#13484](https://github.com/rapidsai/cudf/pull/13484)) [@robertmaynard](https://github.com/robertmaynard)
- Fix chunked Parquet reader benchmark ([#13482](https://github.com/rapidsai/cudf/pull/13482)) [@vuule](https://github.com/vuule)
- Update JNI JSON reader column compatability for Spark ([#13477](https://github.com/rapidsai/cudf/pull/13477)) [@revans2](https://github.com/revans2)
- Fix unsanitized output of scan with strings ([#13455](https://github.com/rapidsai/cudf/pull/13455)) [@davidwendt](https://github.com/davidwendt)
- Reject functions without bytecode from `_can_be_jitted` in GroupBy Apply ([#13429](https://github.com/rapidsai/cudf/pull/13429)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Fix decimal scale reductions in `_get_decimal_type` ([#13224](https://github.com/rapidsai/cudf/pull/13224)) [@charlesbluca](https://github.com/charlesbluca)
## 📖 Documentation
- Fix doxygen groups for io data sources and sinks ([#13718](https://github.com/rapidsai/cudf/pull/13718)) [@davidwendt](https://github.com/davidwendt)
- Add pandas compatibility note to DataFrame.query docstring ([#13693](https://github.com/rapidsai/cudf/pull/13693)) [@beckernick](https://github.com/beckernick)
- Add pylibcudf to developer guide ([#13639](https://github.com/rapidsai/cudf/pull/13639)) [@vyasr](https://github.com/vyasr)
- Fix repeated words in doxygen text ([#13598](https://github.com/rapidsai/cudf/pull/13598)) [@karthikeyann](https://github.com/karthikeyann)
- Update docs for top-level API. ([#13592](https://github.com/rapidsai/cudf/pull/13592)) [@bdice](https://github.com/bdice)
- Fix the the doxygen text for cudf::concatenate and other places ([#13561](https://github.com/rapidsai/cudf/pull/13561)) [@davidwendt](https://github.com/davidwendt)
- Document stream validation approach used in testing ([#13556](https://github.com/rapidsai/cudf/pull/13556)) [@vyasr](https://github.com/vyasr)
- Cleanup doc repetitions in libcudf ([#13470](https://github.com/rapidsai/cudf/pull/13470)) [@karthikeyann](https://github.com/karthikeyann)
## 🚀 New Features
- Support `min` and `max` aggregations for list type in groupby and reduction ([#13676](https://github.com/rapidsai/cudf/pull/13676)) [@ttnghia](https://github.com/ttnghia)
- Add nvtext::jaccard_index API for strings columns ([#13669](https://github.com/rapidsai/cudf/pull/13669)) [@davidwendt](https://github.com/davidwendt)
- Add read_parquet_metadata libcudf API ([#13663](https://github.com/rapidsai/cudf/pull/13663)) [@karthikeyann](https://github.com/karthikeyann)
- Expose streams in all public copying APIs ([#13629](https://github.com/rapidsai/cudf/pull/13629)) [@vyasr](https://github.com/vyasr)
- Add XXHash_64 hash function to cudf ([#13612](https://github.com/rapidsai/cudf/pull/13612)) [@davidwendt](https://github.com/davidwendt)
- Java support: Floating point order-by columns for RANGE window functions ([#13595](https://github.com/rapidsai/cudf/pull/13595)) [@mythrocks](https://github.com/mythrocks)
- Use `cuco::static_map` to build string dictionaries in ORC writer ([#13580](https://github.com/rapidsai/cudf/pull/13580)) [@vuule](https://github.com/vuule)
- Add pylibcudf subpackage with gather implementation ([#13562](https://github.com/rapidsai/cudf/pull/13562)) [@vyasr](https://github.com/vyasr)
- Add JNI for `lists::concatenate_list_elements` ([#13547](https://github.com/rapidsai/cudf/pull/13547)) [@ttnghia](https://github.com/ttnghia)
- Enable nested types for `lists::concatenate_list_elements` ([#13545](https://github.com/rapidsai/cudf/pull/13545)) [@ttnghia](https://github.com/ttnghia)
- Add unicode encoding for string columns in JSON writer ([#13539](https://github.com/rapidsai/cudf/pull/13539)) [@karthikeyann](https://github.com/karthikeyann)
- Remove numba kernels from `find_index_of_val` ([#13517](https://github.com/rapidsai/cudf/pull/13517)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Floating point order-by columns for RANGE window functions ([#13512](https://github.com/rapidsai/cudf/pull/13512)) [@mythrocks](https://github.com/mythrocks)
- Parse column chunk metadata statistics in parquet reader ([#13472](https://github.com/rapidsai/cudf/pull/13472)) [@karthikeyann](https://github.com/karthikeyann)
- Add `abs` function to apply ([#13408](https://github.com/rapidsai/cudf/pull/13408)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- [FEA] AST filtering in parquet reader ([#13348](https://github.com/rapidsai/cudf/pull/13348)) [@karthikeyann](https://github.com/karthikeyann)
- [FEA] Adds option to recover from invalid JSON lines in JSON tokenizer ([#13344](https://github.com/rapidsai/cudf/pull/13344)) [@elstehle](https://github.com/elstehle)
- Ensure cccl packages don't clash with upstream version ([#13235](https://github.com/rapidsai/cudf/pull/13235)) [@robertmaynard](https://github.com/robertmaynard)
- Update `struct_minmax_util` to experimental row comparator ([#13069](https://github.com/rapidsai/cudf/pull/13069)) [@divyegala](https://github.com/divyegala)
- Add stream parameter to hashing APIs ([#12090](https://github.com/rapidsai/cudf/pull/12090)) [@vyasr](https://github.com/vyasr)
## 🛠️ Improvements
- Pin `dask` and `distributed` for `23.08` release ([#13802](https://github.com/rapidsai/cudf/pull/13802)) [@galipremsagar](https://github.com/galipremsagar)
- Relax protobuf pinnings. ([#13770](https://github.com/rapidsai/cudf/pull/13770)) [@bdice](https://github.com/bdice)
- Switch fully unbounded window functions to use aggregations ([#13727](https://github.com/rapidsai/cudf/pull/13727)) [@mythrocks](https://github.com/mythrocks)
- Switch to new wheel building pipeline ([#13723](https://github.com/rapidsai/cudf/pull/13723)) [@vyasr](https://github.com/vyasr)
- Revert CUDA 12.0 CI workflows to branch-23.08. ([#13719](https://github.com/rapidsai/cudf/pull/13719)) [@bdice](https://github.com/bdice)
- Adding identify minimum version requirement ([#13713](https://github.com/rapidsai/cudf/pull/13713)) [@hyperbolic2346](https://github.com/hyperbolic2346)
- Enforce deprecations and add clarifications around existing deprecations ([#13710](https://github.com/rapidsai/cudf/pull/13710)) [@galipremsagar](https://github.com/galipremsagar)
- Optimize ORC reader performance for list data ([#13708](https://github.com/rapidsai/cudf/pull/13708)) [@vyasr](https://github.com/vyasr)
- fix limit overflow message in a docstring ([#13703](https://github.com/rapidsai/cudf/pull/13703)) [@ahmet-uyar](https://github.com/ahmet-uyar)
- Alleviates JSON parser's need for multi-file sources to end with a newline ([#13702](https://github.com/rapidsai/cudf/pull/13702)) [@elstehle](https://github.com/elstehle)
- Update cython-lint and replace flake8 with ruff ([#13699](https://github.com/rapidsai/cudf/pull/13699)) [@vyasr](https://github.com/vyasr)
- Add `__dask_tokenize__` definitions to cudf classes ([#13695](https://github.com/rapidsai/cudf/pull/13695)) [@rjzamora](https://github.com/rjzamora)
- Convert libcudf hashing benchmarks to nvbench ([#13694](https://github.com/rapidsai/cudf/pull/13694)) [@davidwendt](https://github.com/davidwendt)
- Separate MurmurHash32 from hash_functions.cuh ([#13681](https://github.com/rapidsai/cudf/pull/13681)) [@davidwendt](https://github.com/davidwendt)
- Improve performance of cudf::strings::split on whitespace ([#13680](https://github.com/rapidsai/cudf/pull/13680)) [@davidwendt](https://github.com/davidwendt)
- Allow ORC and Parquet writers to write nullable columns without nulls as non-nullable ([#13675](https://github.com/rapidsai/cudf/pull/13675)) [@vuule](https://github.com/vuule)
- Raise a NotImplementedError in to_datetime when utc is passed ([#13670](https://github.com/rapidsai/cudf/pull/13670)) [@shwina](https://github.com/shwina)
- Add rmm_mode parameter to nvbench base fixture ([#13668](https://github.com/rapidsai/cudf/pull/13668)) [@davidwendt](https://github.com/davidwendt)
- Fix multiindex loc ordering in pandas-compat mode ([#13660](https://github.com/rapidsai/cudf/pull/13660)) [@wence-](https://github.com/wence-)
- Add nvtext hash_character_ngrams function ([#13654](https://github.com/rapidsai/cudf/pull/13654)) [@davidwendt](https://github.com/davidwendt)
- Avoid storing metadata in pointers in ORC and Parquet writers ([#13648](https://github.com/rapidsai/cudf/pull/13648)) [@vuule](https://github.com/vuule)
- Acquire spill lock in to/from_arrow ([#13646](https://github.com/rapidsai/cudf/pull/13646)) [@shwina](https://github.com/shwina)
- Expose stable versions of libcudf sort routines ([#13634](https://github.com/rapidsai/cudf/pull/13634)) [@wence-](https://github.com/wence-)
- Separate out hash_test.cpp source for each hash API ([#13633](https://github.com/rapidsai/cudf/pull/13633)) [@davidwendt](https://github.com/davidwendt)
- Remove deprecated cudf::strings::slice_strings (by delimiter) functions ([#13628](https://github.com/rapidsai/cudf/pull/13628)) [@davidwendt](https://github.com/davidwendt)
- Create separate libcudf hash APIs for each supported hash function ([#13626](https://github.com/rapidsai/cudf/pull/13626)) [@davidwendt](https://github.com/davidwendt)
- Add convert_dtypes API ([#13623](https://github.com/rapidsai/cudf/pull/13623)) [@shwina](https://github.com/shwina)
- Clean up cupy in dependencies.yaml. ([#13617](https://github.com/rapidsai/cudf/pull/13617)) [@bdice](https://github.com/bdice)
- Use cuda-version to constrain cudatoolkit. ([#13615](https://github.com/rapidsai/cudf/pull/13615)) [@bdice](https://github.com/bdice)
- Add murmurhash3_x64_128 function to libcudf ([#13604](https://github.com/rapidsai/cudf/pull/13604)) [@davidwendt](https://github.com/davidwendt)
- Performance improvement for cudf::strings::like ([#13594](https://github.com/rapidsai/cudf/pull/13594)) [@davidwendt](https://github.com/davidwendt)
- Remove deprecated cudf.set_allocator. ([#13591](https://github.com/rapidsai/cudf/pull/13591)) [@bdice](https://github.com/bdice)
- Clean up cudf device atomic with `cuda::atomic_ref` ([#13583](https://github.com/rapidsai/cudf/pull/13583)) [@PointKernel](https://github.com/PointKernel)
- Add java bindings for distinct count ([#13573](https://github.com/rapidsai/cudf/pull/13573)) [@revans2](https://github.com/revans2)
- Use nvcomp conda package. ([#13566](https://github.com/rapidsai/cudf/pull/13566)) [@bdice](https://github.com/bdice)
- Add exception to string_scalar if input string exceeds size_type ([#13560](https://github.com/rapidsai/cudf/pull/13560)) [@davidwendt](https://github.com/davidwendt)
- Add dispatch for `cudf.Dataframe` to/from `pyarrow.Table` conversion ([#13558](https://github.com/rapidsai/cudf/pull/13558)) [@rjzamora](https://github.com/rjzamora)
- Get rid of `cuco::pair_type` aliases ([#13553](https://github.com/rapidsai/cudf/pull/13553)) [@PointKernel](https://github.com/PointKernel)
- Introduce parity with pandas when `sort=False` in `Groupby` ([#13551](https://github.com/rapidsai/cudf/pull/13551)) [@galipremsagar](https://github.com/galipremsagar)
- Update CMake in docker to 3.26.4 ([#13550](https://github.com/rapidsai/cudf/pull/13550)) [@NvTimLiu](https://github.com/NvTimLiu)
- Clarify source of error message in stream testing. ([#13541](https://github.com/rapidsai/cudf/pull/13541)) [@bdice](https://github.com/bdice)
- Deprecate `strings_to_categorical` in `cudf.read_parquet` ([#13540](https://github.com/rapidsai/cudf/pull/13540)) [@galipremsagar](https://github.com/galipremsagar)
- Update to CMake 3.26.4 ([#13538](https://github.com/rapidsai/cudf/pull/13538)) [@vyasr](https://github.com/vyasr)
- s3 folder naming fix ([#13536](https://github.com/rapidsai/cudf/pull/13536)) [@AyodeAwe](https://github.com/AyodeAwe)
- Implement iloc-getitem using parse-don't-validate approach ([#13534](https://github.com/rapidsai/cudf/pull/13534)) [@wence-](https://github.com/wence-)
- Make synchronization explicit in the names of `hostdevice_*` copying APIs ([#13530](https://github.com/rapidsai/cudf/pull/13530)) [@ttnghia](https://github.com/ttnghia)
- Add benchmark (Google Benchmark) dependency to conda packages. ([#13528](https://github.com/rapidsai/cudf/pull/13528)) [@bdice](https://github.com/bdice)
- Add libcufile to dependencies.yaml. ([#13523](https://github.com/rapidsai/cudf/pull/13523)) [@bdice](https://github.com/bdice)
- Fix some memoization logic in groupby/sort/sort_helper.cu ([#13521](https://github.com/rapidsai/cudf/pull/13521)) [@davidwendt](https://github.com/davidwendt)
- Use sizes_to_offsets_iterator in cudf::gather for strings ([#13520](https://github.com/rapidsai/cudf/pull/13520)) [@davidwendt](https://github.com/davidwendt)
- use rapids-upload-docs script ([#13518](https://github.com/rapidsai/cudf/pull/13518)) [@AyodeAwe](https://github.com/AyodeAwe)
- Support UTF-8 BOM in CSV reader ([#13516](https://github.com/rapidsai/cudf/pull/13516)) [@davidwendt](https://github.com/davidwendt)
- Move stream-related test configuration to CMake ([#13513](https://github.com/rapidsai/cudf/pull/13513)) [@vyasr](https://github.com/vyasr)
- Implement `cudf.option_context` ([#13511](https://github.com/rapidsai/cudf/pull/13511)) [@galipremsagar](https://github.com/galipremsagar)
- Unpin `dask` and `distributed` for development ([#13508](https://github.com/rapidsai/cudf/pull/13508)) [@galipremsagar](https://github.com/galipremsagar)
- Change build.sh to use pip install instead of setup.py ([#13507](https://github.com/rapidsai/cudf/pull/13507)) [@vyasr](https://github.com/vyasr)
- Use test default stream ([#13506](https://github.com/rapidsai/cudf/pull/13506)) [@vyasr](https://github.com/vyasr)
- Remove documentation build scripts for Jenkins ([#13495](https://github.com/rapidsai/cudf/pull/13495)) [@ajschmidt8](https://github.com/ajschmidt8)
- Use east const in include files ([#13494](https://github.com/rapidsai/cudf/pull/13494)) [@karthikeyann](https://github.com/karthikeyann)
- Use east const in src files ([#13493](https://github.com/rapidsai/cudf/pull/13493)) [@karthikeyann](https://github.com/karthikeyann)
- Use east const in tests files ([#13492](https://github.com/rapidsai/cudf/pull/13492)) [@karthikeyann](https://github.com/karthikeyann)
- Use east const in benchmarks files ([#13491](https://github.com/rapidsai/cudf/pull/13491)) [@karthikeyann](https://github.com/karthikeyann)
- Performance improvement for nvtext tokenize/token functions ([#13480](https://github.com/rapidsai/cudf/pull/13480)) [@davidwendt](https://github.com/davidwendt)
- Add pd.Float*Dtype to Avro and ORC mappings ([#13475](https://github.com/rapidsai/cudf/pull/13475)) [@mroeschke](https://github.com/mroeschke)
- Use pandas public APIs where available ([#13467](https://github.com/rapidsai/cudf/pull/13467)) [@mroeschke](https://github.com/mroeschke)
- Allow pd.ArrowDtype in cudf.from_pandas ([#13465](https://github.com/rapidsai/cudf/pull/13465)) [@mroeschke](https://github.com/mroeschke)
- Rework libcudf regex benchmarks with nvbench ([#13464](https://github.com/rapidsai/cudf/pull/13464)) [@davidwendt](https://github.com/davidwendt)
- Remove unused max_rows_tensor parameter from subword tokenizer ([#13463](https://github.com/rapidsai/cudf/pull/13463)) [@davidwendt](https://github.com/davidwendt)
- Separate io-text and nvtext pytests into different files ([#13435](https://github.com/rapidsai/cudf/pull/13435)) [@davidwendt](https://github.com/davidwendt)
- Add a move_to function to cudf::string_view::const_iterator ([#13428](https://github.com/rapidsai/cudf/pull/13428)) [@davidwendt](https://github.com/davidwendt)
- Allow newer scikit-build ([#13424](https://github.com/rapidsai/cudf/pull/13424)) [@vyasr](https://github.com/vyasr)
- Refactor sort_by_values to sort_values, drop indices from return values. ([#13419](https://github.com/rapidsai/cudf/pull/13419)) [@bdice](https://github.com/bdice)
- Inline Cython exception handler ([#13411](https://github.com/rapidsai/cudf/pull/13411)) [@vyasr](https://github.com/vyasr)
- Init JNI version 23.08.0-SNAPSHOT ([#13401](https://github.com/rapidsai/cudf/pull/13401)) [@pxLi](https://github.com/pxLi)
- Refactor ORC reader ([#13396](https://github.com/rapidsai/cudf/pull/13396)) [@ttnghia](https://github.com/ttnghia)
- JNI: Remove cleaned objects in memory cleaner ([#13378](https://github.com/rapidsai/cudf/pull/13378)) [@res-life](https://github.com/res-life)
- Add tests of currently unsupported indexing ([#13338](https://github.com/rapidsai/cudf/pull/13338)) [@wence-](https://github.com/wence-)
- Performance improvement for some libcudf regex functions for long strings ([#13322](https://github.com/rapidsai/cudf/pull/13322)) [@davidwendt](https://github.com/davidwendt)
- Exposure Tracked Buffer (first step towards unifying copy-on-write and spilling) ([#13307](https://github.com/rapidsai/cudf/pull/13307)) [@madsbk](https://github.com/madsbk)
- Write string data directly to column_buffer in Parquet reader ([#13302](https://github.com/rapidsai/cudf/pull/13302)) [@etseidl](https://github.com/etseidl)
- Add stacktrace into cudf exception types ([#13298](https://github.com/rapidsai/cudf/pull/13298)) [@ttnghia](https://github.com/ttnghia)
- cuDF: Build CUDA 12 packages ([#12922](https://github.com/rapidsai/cudf/pull/12922)) [@bdice](https://github.com/bdice)
# cuDF 23.06.00 (7 Jun 2023)
## 🚨 Breaking Changes
- Fix batch processing for parquet writer ([#13438](https://github.com/rapidsai/cudf/pull/13438)) [@ttnghia](https://github.com/ttnghia)
- Use <NA> instead of null to match pandas. ([#13415](https://github.com/rapidsai/cudf/pull/13415)) [@bdice](https://github.com/bdice)
- Remove UNKNOWN_NULL_COUNT ([#13372](https://github.com/rapidsai/cudf/pull/13372)) [@vyasr](https://github.com/vyasr)
- Remove default UNKNOWN_NULL_COUNT from cudf::column member functions ([#13341](https://github.com/rapidsai/cudf/pull/13341)) [@davidwendt](https://github.com/davidwendt)
- Use std::overflow_error when output would exceed column size limit ([#13323](https://github.com/rapidsai/cudf/pull/13323)) [@davidwendt](https://github.com/davidwendt)
- Remove null mask and null count from column_view constructors ([#13311](https://github.com/rapidsai/cudf/pull/13311)) [@vyasr](https://github.com/vyasr)
- Change default value of the `observed=` argument in groupby to `True` to reflect the actual behaviour ([#13296](https://github.com/rapidsai/cudf/pull/13296)) [@shwina](https://github.com/shwina)
- Throw error if UNINITIALIZED is passed to cudf::state_null_count ([#13292](https://github.com/rapidsai/cudf/pull/13292)) [@davidwendt](https://github.com/davidwendt)
- Remove default null-count parameter from cudf::make_strings_column factory ([#13227](https://github.com/rapidsai/cudf/pull/13227)) [@davidwendt](https://github.com/davidwendt)
- Remove UNKNOWN_NULL_COUNT where it can be easily computed ([#13205](https://github.com/rapidsai/cudf/pull/13205)) [@vyasr](https://github.com/vyasr)
- Update minimum Python version to Python 3.9 ([#13196](https://github.com/rapidsai/cudf/pull/13196)) [@shwina](https://github.com/shwina)
- Refactor contiguous_split API into contiguous_split.hpp ([#13186](https://github.com/rapidsai/cudf/pull/13186)) [@abellina](https://github.com/abellina)
- Cleanup Parquet chunked writer ([#13094](https://github.com/rapidsai/cudf/pull/13094)) [@ttnghia](https://github.com/ttnghia)
- Cleanup ORC chunked writer ([#13091](https://github.com/rapidsai/cudf/pull/13091)) [@ttnghia](https://github.com/ttnghia)
- Raise `NotImplementedError` when attempting to construct cuDF objects from timezone-aware datetimes ([#13086](https://github.com/rapidsai/cudf/pull/13086)) [@shwina](https://github.com/shwina)
- Remove deprecated regex functions from libcudf ([#13067](https://github.com/rapidsai/cudf/pull/13067)) [@davidwendt](https://github.com/davidwendt)
- [REVIEW] Upgrade to `arrow-11` ([#12757](https://github.com/rapidsai/cudf/pull/12757)) [@galipremsagar](https://github.com/galipremsagar)
- Implement Python drop_duplicates with cudf::stable_distinct. ([#11656](https://github.com/rapidsai/cudf/pull/11656)) [@brandon-b-miller](https://github.com/brandon-b-miller)
## 🐛 Bug Fixes
- Fix valid count computation in offset_bitmask_binop kernel ([#13489](https://github.com/rapidsai/cudf/pull/13489)) [@davidwendt](https://github.com/davidwendt)
- Fix writing of ORC files with empty rowgroups ([#13466](https://github.com/rapidsai/cudf/pull/13466)) [@vuule](https://github.com/vuule)
- Fix cudf::repeat logic when count is zero ([#13459](https://github.com/rapidsai/cudf/pull/13459)) [@davidwendt](https://github.com/davidwendt)
- Fix batch processing for parquet writer ([#13438](https://github.com/rapidsai/cudf/pull/13438)) [@ttnghia](https://github.com/ttnghia)
- Fix invalid use of std::exclusive_scan in Parquet writer ([#13434](https://github.com/rapidsai/cudf/pull/13434)) [@etseidl](https://github.com/etseidl)
- Patch numba if it is imported first to ensure minor version compatibility works. ([#13433](https://github.com/rapidsai/cudf/pull/13433)) [@bdice](https://github.com/bdice)
- Fix cudf::strings::replace_with_backrefs hang on empty match result ([#13418](https://github.com/rapidsai/cudf/pull/13418)) [@davidwendt](https://github.com/davidwendt)
- Use <NA> instead of null to match pandas. ([#13415](https://github.com/rapidsai/cudf/pull/13415)) [@bdice](https://github.com/bdice)
- Fix tokenize with non-space delimiter ([#13403](https://github.com/rapidsai/cudf/pull/13403)) [@shwina](https://github.com/shwina)
- Fix groupby head/tail for empty dataframe ([#13398](https://github.com/rapidsai/cudf/pull/13398)) [@shwina](https://github.com/shwina)
- Default to closed="right" in `IntervalIndex` constructor ([#13394](https://github.com/rapidsai/cudf/pull/13394)) [@shwina](https://github.com/shwina)
- Correctly reorder and reindex scan groupbys with null keys ([#13389](https://github.com/rapidsai/cudf/pull/13389)) [@wence-](https://github.com/wence-)
- Fix unused argument errors in nvcc 11.5 ([#13387](https://github.com/rapidsai/cudf/pull/13387)) [@abellina](https://github.com/abellina)
- Updates needed to work with jitify that leverages libcudacxx ([#13383](https://github.com/rapidsai/cudf/pull/13383)) [@robertmaynard](https://github.com/robertmaynard)
- Fix unused parameter warning/error in parquet/page_data.cu ([#13367](https://github.com/rapidsai/cudf/pull/13367)) [@davidwendt](https://github.com/davidwendt)
- Fix page size estimation in Parquet writer ([#13364](https://github.com/rapidsai/cudf/pull/13364)) [@etseidl](https://github.com/etseidl)
- Fix subword_tokenize error when input contains no tokens ([#13320](https://github.com/rapidsai/cudf/pull/13320)) [@davidwendt](https://github.com/davidwendt)
- Support gcc 12 as the C++ compiler ([#13316](https://github.com/rapidsai/cudf/pull/13316)) [@robertmaynard](https://github.com/robertmaynard)
- Correctly set bitmask size in `from_column_view` ([#13315](https://github.com/rapidsai/cudf/pull/13315)) [@wence-](https://github.com/wence-)
- Fix approach to detecting assignment for gte/lte operators ([#13285](https://github.com/rapidsai/cudf/pull/13285)) [@vyasr](https://github.com/vyasr)
- Fix parquet schema interpretation issue ([#13277](https://github.com/rapidsai/cudf/pull/13277)) [@hyperbolic2346](https://github.com/hyperbolic2346)
- Fix 64bit shift bug in avro reader ([#13276](https://github.com/rapidsai/cudf/pull/13276)) [@karthikeyann](https://github.com/karthikeyann)
- Fix unused variables/parameters in parquet/writer_impl.cu ([#13263](https://github.com/rapidsai/cudf/pull/13263)) [@davidwendt](https://github.com/davidwendt)
- Clean up buffers in case AssertionError ([#13262](https://github.com/rapidsai/cudf/pull/13262)) [@razajafri](https://github.com/razajafri)
- Allow empty input table in ast `compute_column` ([#13245](https://github.com/rapidsai/cudf/pull/13245)) [@wence-](https://github.com/wence-)
- Fix structs_column_wrapper constructors to copy input column wrappers ([#13243](https://github.com/rapidsai/cudf/pull/13243)) [@davidwendt](https://github.com/davidwendt)
- Fix the row index stream order in ORC reader ([#13242](https://github.com/rapidsai/cudf/pull/13242)) [@vuule](https://github.com/vuule)
- Make `is_decompression_disabled` and `is_compression_disabled` thread-safe ([#13240](https://github.com/rapidsai/cudf/pull/13240)) [@vuule](https://github.com/vuule)
- Add [[maybe_unused]] to nvbench environment. ([#13219](https://github.com/rapidsai/cudf/pull/13219)) [@bdice](https://github.com/bdice)
- Fix race in ORC string dictionary creation ([#13214](https://github.com/rapidsai/cudf/pull/13214)) [@revans2](https://github.com/revans2)
- Add scalar argtypes to udf cache keys ([#13194](https://github.com/rapidsai/cudf/pull/13194)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Fix unused parameter warning/error in grouped_rolling.cu ([#13192](https://github.com/rapidsai/cudf/pull/13192)) [@davidwendt](https://github.com/davidwendt)
- Avoid skbuild 0.17.2 which affected the cmake -DPython_LIBRARY string ([#13188](https://github.com/rapidsai/cudf/pull/13188)) [@sevagh](https://github.com/sevagh)
- Fix `hostdevice_vector::subspan` ([#13187](https://github.com/rapidsai/cudf/pull/13187)) [@ttnghia](https://github.com/ttnghia)
- Use custom nvbench entry point to ensure `cudf::nvbench_base_fixture` usage ([#13183](https://github.com/rapidsai/cudf/pull/13183)) [@robertmaynard](https://github.com/robertmaynard)
- Fix slice_strings to return empty strings for stop < start indices ([#13178](https://github.com/rapidsai/cudf/pull/13178)) [@davidwendt](https://github.com/davidwendt)
- Allow compilation with any GTest version 1.11+ ([#13153](https://github.com/rapidsai/cudf/pull/13153)) [@robertmaynard](https://github.com/robertmaynard)
- Fix a few clang-format style check errors ([#13146](https://github.com/rapidsai/cudf/pull/13146)) [@davidwendt](https://github.com/davidwendt)
- [REVIEW] Fix `Series` and `DataFrame` constructors to validate index lengths ([#13122](https://github.com/rapidsai/cudf/pull/13122)) [@galipremsagar](https://github.com/galipremsagar)
- Fix hash join when the input tables have nulls on only one side ([#13120](https://github.com/rapidsai/cudf/pull/13120)) [@ttnghia](https://github.com/ttnghia)
- Fix GPU_ARCHS setting in Java CMake build and CMAKE_CUDA_ARCHITECTURES in Python package build. ([#13117](https://github.com/rapidsai/cudf/pull/13117)) [@davidwendt](https://github.com/davidwendt)
- Adds checks to make sure json reader won't overflow ([#13115](https://github.com/rapidsai/cudf/pull/13115)) [@elstehle](https://github.com/elstehle)
- Fix `null_count` of columns returned by `chunked_parquet_reader` ([#13111](https://github.com/rapidsai/cudf/pull/13111)) [@vuule](https://github.com/vuule)
- Fixes sliced list and struct column bug in JSON chunked writer ([#13108](https://github.com/rapidsai/cudf/pull/13108)) [@karthikeyann](https://github.com/karthikeyann)
- [REVIEW] Fix missing confluent kafka version ([#13101](https://github.com/rapidsai/cudf/pull/13101)) [@galipremsagar](https://github.com/galipremsagar)
- Use make_empty_lists_column instead of make_empty_column(type_id::LIST) ([#13099](https://github.com/rapidsai/cudf/pull/13099)) [@davidwendt](https://github.com/davidwendt)
- Raise `NotImplementedError` when attempting to construct cuDF objects from timezone-aware datetimes ([#13086](https://github.com/rapidsai/cudf/pull/13086)) [@shwina](https://github.com/shwina)
- Fix column selection `read_parquet` benchmarks ([#13082](https://github.com/rapidsai/cudf/pull/13082)) [@vuule](https://github.com/vuule)
- Fix bugs in iterative groupby apply algorithm ([#13078](https://github.com/rapidsai/cudf/pull/13078)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Add algorithm include in data_sink.hpp ([#13068](https://github.com/rapidsai/cudf/pull/13068)) [@ahendriksen](https://github.com/ahendriksen)
- Fix tests/identify_stream_usage.cpp ([#13066](https://github.com/rapidsai/cudf/pull/13066)) [@ahendriksen](https://github.com/ahendriksen)
- Prevent overflow with `skip_rows` in ORC and Parquet readers ([#13063](https://github.com/rapidsai/cudf/pull/13063)) [@vuule](https://github.com/vuule)
- Add except declaration in Cython interface for regex_program::create ([#13054](https://github.com/rapidsai/cudf/pull/13054)) [@davidwendt](https://github.com/davidwendt)
- [REVIEW] Fix branch version in CI scripts ([#13029](https://github.com/rapidsai/cudf/pull/13029)) [@galipremsagar](https://github.com/galipremsagar)
- Fix OOB memory access in CSV reader when reading without NA values ([#13011](https://github.com/rapidsai/cudf/pull/13011)) [@vuule](https://github.com/vuule)
- Fix read_avro() skip_rows and num_rows. ([#12912](https://github.com/rapidsai/cudf/pull/12912)) [@tpn](https://github.com/tpn)
- Purge nonempty nulls from byte_cast list outputs. ([#11971](https://github.com/rapidsai/cudf/pull/11971)) [@bdice](https://github.com/bdice)
- Fix consumption of CPU-backed interchange protocol dataframes ([#11392](https://github.com/rapidsai/cudf/pull/11392)) [@shwina](https://github.com/shwina)
## 🚀 New Features
- Remove numba JIT kernel usage from dataframe copy tests ([#13385](https://github.com/rapidsai/cudf/pull/13385)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Add JNI for ORC/Parquet writer compression statistics ([#13376](https://github.com/rapidsai/cudf/pull/13376)) [@ttnghia](https://github.com/ttnghia)
- Use _compile_or_get in JIT groupby apply ([#13350](https://github.com/rapidsai/cudf/pull/13350)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- cuDF numba cuda 12 updates ([#13337](https://github.com/rapidsai/cudf/pull/13337)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Add tz_convert method to convert between timestamps ([#13328](https://github.com/rapidsai/cudf/pull/13328)) [@shwina](https://github.com/shwina)
- Optionally return compression statistics from ORC and Parquet writers ([#13294](https://github.com/rapidsai/cudf/pull/13294)) [@vuule](https://github.com/vuule)
- Support the case=False argument to str.contains ([#13290](https://github.com/rapidsai/cudf/pull/13290)) [@shwina](https://github.com/shwina)
- Add an event handler for ColumnVector.close ([#13279](https://github.com/rapidsai/cudf/pull/13279)) [@abellina](https://github.com/abellina)
- JNI api for cudf::chunked_pack ([#13278](https://github.com/rapidsai/cudf/pull/13278)) [@abellina](https://github.com/abellina)
- Implement a chunked_pack API ([#13260](https://github.com/rapidsai/cudf/pull/13260)) [@abellina](https://github.com/abellina)
- Update cudf recipes to use GTest version to >=1.13 ([#13207](https://github.com/rapidsai/cudf/pull/13207)) [@robertmaynard](https://github.com/robertmaynard)
- JNI changes for range-extents in window functions. ([#13199](https://github.com/rapidsai/cudf/pull/13199)) [@mythrocks](https://github.com/mythrocks)
- Add support for DatetimeTZDtype and tz_localize ([#13163](https://github.com/rapidsai/cudf/pull/13163)) [@shwina](https://github.com/shwina)
- Add IS_NULL operator to AST ([#13145](https://github.com/rapidsai/cudf/pull/13145)) [@karthikeyann](https://github.com/karthikeyann)
- STRING order-by column for RANGE window functions ([#13143](https://github.com/rapidsai/cudf/pull/13143)) [@mythrocks](https://github.com/mythrocks)
- Update `contains_table` to experimental row hasher and equality comparator ([#13119](https://github.com/rapidsai/cudf/pull/13119)) [@divyegala](https://github.com/divyegala)
- Automatically select `GroupBy.apply` algorithm based on if the UDF is jittable ([#13113](https://github.com/rapidsai/cudf/pull/13113)) [@brandon-b-miller](https://github.com/brandon-b-miller)
- Refactor Parquet chunked writer ([#13076](https://github.com/rapidsai/cudf/pull/13076)) [@ttnghia](https://github.com/ttnghia)
- Add Python bindings for string literal support in AST ([#13073](https://github.com/rapidsai/cudf/pull/13073)) [@karthikeyann](https://github.com/karthikeyann)
- Add Java bindings for string literal support in AST ([#13072](https://github.com/rapidsai/cudf/pull/13072)) [@karthikeyann](https://github.com/karthikeyann)
- Add string scalar support in AST ([#13061](https://github.com/rapidsai/cudf/pull/13061)) [@karthikeyann](https://github.com/karthikeyann)
- Log cuIO warnings using the libcudf logger ([#13043](https://github.com/rapidsai/cudf/pull/13043)) [@vuule](https://github.com/vuule)
- Update `mixed_join` to use experimental row hasher and comparator ([#13028](https://github.com/rapidsai/cudf/pull/13028)) [@divyegala](https://github.com/divyegala)
- Support structs of lists in row lexicographic comparator ([#13005](https://github.com/rapidsai/cudf/pull/13005)) [@ttnghia](https://github.com/ttnghia)
- Adding `hostdevice_span` that is a span createable from `hostdevice_vector` ([#12981](https://github.com/rapidsai/cudf/pull/12981)) [@hyperbolic2346](https://github.com/hyperbolic2346)
- Add nvtext::minhash function ([#12961](https://github.com/rapidsai/cudf/pull/12961)) [@davidwendt](https://github.com/davidwendt)
- Support lists of structs in row lexicographic comparator ([#12953](https://github.com/rapidsai/cudf/pull/12953)) [@ttnghia](https://github.com/ttnghia)
- Update `join` to use experimental row hasher and comparator ([#12787](https://github.com/rapidsai/cudf/pull/12787)) [@divyegala](https://github.com/divyegala)
- Implement Python drop_duplicates with cudf::stable_distinct. ([#11656](https://github.com/rapidsai/cudf/pull/11656)) [@brandon-b-miller](https://github.com/brandon-b-miller)
## 🛠️ Improvements
- Drop extraneous dependencies from cudf conda recipe. ([#13406](https://github.com/rapidsai/cudf/pull/13406)) [@bdice](https://github.com/bdice)
- Handle some corner-cases in indexing with boolean masks ([#13402](https://github.com/rapidsai/cudf/pull/13402)) [@wence-](https://github.com/wence-)
- Add cudf::stable_distinct public API, tests, and benchmarks. ([#13392](https://github.com/rapidsai/cudf/pull/13392)) [@bdice](https://github.com/bdice)
- [JNI] Pass this ColumnVector to the onClosed event handler ([#13386](https://github.com/rapidsai/cudf/pull/13386)) [@abellina](https://github.com/abellina)
- Fix JNI method with mismatched parameter list ([#13384](https://github.com/rapidsai/cudf/pull/13384)) [@ttnghia](https://github.com/ttnghia)
- Split up experimental_row_operator_tests.cu to improve its compile time ([#13382](https://github.com/rapidsai/cudf/pull/13382)) [@davidwendt](https://github.com/davidwendt)
- Deprecate cudf::strings::slice_strings APIs that accept delimiters ([#13373](https://github.com/rapidsai/cudf/pull/13373)) [@davidwendt](https://github.com/davidwendt)
- Remove UNKNOWN_NULL_COUNT ([#13372](https://github.com/rapidsai/cudf/pull/13372)) [@vyasr](https://github.com/vyasr)
- Move some nvtext benchmarks to nvbench ([#13368](https://github.com/rapidsai/cudf/pull/13368)) [@davidwendt](https://github.com/davidwendt)
- run docs nightly too ([#13366](https://github.com/rapidsai/cudf/pull/13366)) [@AyodeAwe](https://github.com/AyodeAwe)
- Add warning for default `dtype` parameter in `get_dummies` ([#13365](https://github.com/rapidsai/cudf/pull/13365)) [@galipremsagar](https://github.com/galipremsagar)
- Add log messages about kvikIO compatibility mode ([#13363](https://github.com/rapidsai/cudf/pull/13363)) [@vuule](https://github.com/vuule)
- Switch back to using primary shared-action-workflows branch ([#13362](https://github.com/rapidsai/cudf/pull/13362)) [@vyasr](https://github.com/vyasr)
- Deprecate `StringIndex` and use `Index` instead ([#13361](https://github.com/rapidsai/cudf/pull/13361)) [@galipremsagar](https://github.com/galipremsagar)
- Ensure columns have valid null counts in CUDF JNI. ([#13355](https://github.com/rapidsai/cudf/pull/13355)) [@mythrocks](https://github.com/mythrocks)
- Expunge most uses of `TypeVar(bound="Foo")` ([#13346](https://github.com/rapidsai/cudf/pull/13346)) [@wence-](https://github.com/wence-)
- Remove all references to UNKNOWN_NULL_COUNT in Python ([#13345](https://github.com/rapidsai/cudf/pull/13345)) [@vyasr](https://github.com/vyasr)
- Improve `distinct_count` with `cuco::static_set` ([#13343](https://github.com/rapidsai/cudf/pull/13343)) [@PointKernel](https://github.com/PointKernel)
- Fix `contiguous_split` performance ([#13342](https://github.com/rapidsai/cudf/pull/13342)) [@ttnghia](https://github.com/ttnghia)
- Remove default UNKNOWN_NULL_COUNT from cudf::column member functions ([#13341](https://github.com/rapidsai/cudf/pull/13341)) [@davidwendt](https://github.com/davidwendt)
- Update mypy to 1.3 ([#13340](https://github.com/rapidsai/cudf/pull/13340)) [@wence-](https://github.com/wence-)
- [Java] Purge non-empty nulls when setting validity ([#13335](https://github.com/rapidsai/cudf/pull/13335)) [@razajafri](https://github.com/razajafri)
- Add row-wise filtering step to `read_parquet` ([#13334](https://github.com/rapidsai/cudf/pull/13334)) [@rjzamora](https://github.com/rjzamora)
- Performance improvement for nvtext::minhash ([#13333](https://github.com/rapidsai/cudf/pull/13333)) [@davidwendt](https://github.com/davidwendt)
- Fix some libcudf functions to set the null count on returning columns ([#13331](https://github.com/rapidsai/cudf/pull/13331)) [@davidwendt](https://github.com/davidwendt)
- Change cudf::detail::concatenate_masks to return null-count ([#13330](https://github.com/rapidsai/cudf/pull/13330)) [@davidwendt](https://github.com/davidwendt)
- Move `meta` calculation in `dask_cudf.read_parquet` ([#13327](https://github.com/rapidsai/cudf/pull/13327)) [@rjzamora](https://github.com/rjzamora)
- Changes to support Numpy >= 1.24 ([#13325](https://github.com/rapidsai/cudf/pull/13325)) [@shwina](https://github.com/shwina)
- Use std::overflow_error when output would exceed column size limit ([#13323](https://github.com/rapidsai/cudf/pull/13323)) [@davidwendt](https://github.com/davidwendt)
- Clean up `distinct_count` benchmark ([#13321](https://github.com/rapidsai/cudf/pull/13321)) [@PointKernel](https://github.com/PointKernel)
- Fix gtest pinning to 1.13.0. ([#13319](https://github.com/rapidsai/cudf/pull/13319)) [@bdice](https://github.com/bdice)
- Remove null mask and null count from column_view constructors ([#13311](https://github.com/rapidsai/cudf/pull/13311)) [@vyasr](https://github.com/vyasr)
- Address feedback from 13289 ([#13306](https://github.com/rapidsai/cudf/pull/13306)) [@vyasr](https://github.com/vyasr)
- Change default value of the `observed=` argument in groupby to `True` to reflect the actual behaviour ([#13296](https://github.com/rapidsai/cudf/pull/13296)) [@shwina](https://github.com/shwina)
- First check for `BaseDtype` when infering the data type of an arbitrary object ([#13295](https://github.com/rapidsai/cudf/pull/13295)) [@shwina](https://github.com/shwina)
- Throw error if UNINITIALIZED is passed to cudf::state_null_count ([#13292](https://github.com/rapidsai/cudf/pull/13292)) [@davidwendt](https://github.com/davidwendt)
- Support CUDA 12.0 for pip wheels ([#13289](https://github.com/rapidsai/cudf/pull/13289)) [@divyegala](https://github.com/divyegala)
- Refactor `transform_lists_of_structs` in `row_operators.cu` ([#13288](https://github.com/rapidsai/cudf/pull/13288)) [@ttnghia](https://github.com/ttnghia)
- Branch 23.06 merge 23.04 ([#13286](https://github.com/rapidsai/cudf/pull/13286)) [@vyasr](https://github.com/vyasr)
- Update cupy dependency ([#13284](https://github.com/rapidsai/cudf/pull/13284)) [@vyasr](https://github.com/vyasr)
- Performance improvement in cudf::strings::join_strings for long strings ([#13283](https://github.com/rapidsai/cudf/pull/13283)) [@davidwendt](https://github.com/davidwendt)
- Fix unused variables and functions ([#13275](https://github.com/rapidsai/cudf/pull/13275)) [@karthikeyann](https://github.com/karthikeyann)
- Fix integer overflow in `partition` `scatter_map` construction ([#13272](https://github.com/rapidsai/cudf/pull/13272)) [@wence-](https://github.com/wence-)
- Numba 0.57 compatibility fixes ([#13271](https://github.com/rapidsai/cudf/pull/13271)) [@gmarkall](https://github.com/gmarkall)
- Performance improvement in cudf::strings::all_characters_of_type ([#13259](https://github.com/rapidsai/cudf/pull/13259)) [@davidwendt](https://github.com/davidwendt)
- Remove default null-count parameter from some libcudf factory functions ([#13258](https://github.com/rapidsai/cudf/pull/13258)) [@davidwendt](https://github.com/davidwendt)
- Roll our own generate_string() because mimesis' has gone away ([#13257](https://github.com/rapidsai/cudf/pull/13257)) [@shwina](https://github.com/shwina)
- Build wheels using new single image workflow ([#13249](https://github.com/rapidsai/cudf/pull/13249)) [@vyasr](https://github.com/vyasr)
- Enable sccache hits from local builds ([#13248](https://github.com/rapidsai/cudf/pull/13248)) [@AyodeAwe](https://github.com/AyodeAwe)
- Revert to branch-23.06 for shared-action-workflows ([#13247](https://github.com/rapidsai/cudf/pull/13247)) [@shwina](https://github.com/shwina)
- Introduce `pandas_compatible` option in `cudf` ([#13241](https://github.com/rapidsai/cudf/pull/13241)) [@galipremsagar](https://github.com/galipremsagar)
- Add metadata_builder helper class ([#13232](https://github.com/rapidsai/cudf/pull/13232)) [@abellina](https://github.com/abellina)
- Use libkvikio conda packages in libcudf, add explicit libcufile dependency. ([#13231](https://github.com/rapidsai/cudf/pull/13231)) [@bdice](https://github.com/bdice)
- Remove default null-count parameter from cudf::make_strings_column factory ([#13227](https://github.com/rapidsai/cudf/pull/13227)) [@davidwendt](https://github.com/davidwendt)
- Performance improvement in cudf::strings::find/rfind for long strings ([#13226](https://github.com/rapidsai/cudf/pull/13226)) [@davidwendt](https://github.com/davidwendt)
- Add chunked reader benchmark ([#13223](https://github.com/rapidsai/cudf/pull/13223)) [@SrikarVanavasam](https://github.com/SrikarVanavasam)
- Set the null count in output columns in the CSV reader ([#13221](https://github.com/rapidsai/cudf/pull/13221)) [@vuule](https://github.com/vuule)
- Skip Non-Empty nulls tests for the nightly build just like we skip CuFileTest and CudaFatalTest ([#13213](https://github.com/rapidsai/cudf/pull/13213)) [@razajafri](https://github.com/razajafri)
- Fix string_scalar stream usage in write_json.cu ([#13212](https://github.com/rapidsai/cudf/pull/13212)) [@davidwendt](https://github.com/davidwendt)
- Use canonicalized name for dlopen'd libraries (libcufile) ([#13210](https://github.com/rapidsai/cudf/pull/13210)) [@shwina](https://github.com/shwina)
- Refactor pinned memory vector and ORC+Parquet writers ([#13206](https://github.com/rapidsai/cudf/pull/13206)) [@ttnghia](https://github.com/ttnghia)
- Remove UNKNOWN_NULL_COUNT where it can be easily computed ([#13205](https://github.com/rapidsai/cudf/pull/13205)) [@vyasr](https://github.com/vyasr)
- Optimization to decoding of parquet level streams ([#13203](https://github.com/rapidsai/cudf/pull/13203)) [@nvdbaranec](https://github.com/nvdbaranec)
- Clean up and simplify `gpuDecideCompression` ([#13202](https://github.com/rapidsai/cudf/pull/13202)) [@vuule](https://github.com/vuule)
- Use std::array for a statically sized vector in `create_serialized_trie` ([#13201](https://github.com/rapidsai/cudf/pull/13201)) [@vuule](https://github.com/vuule)
- Update minimum Python version to Python 3.9 ([#13196](https://github.com/rapidsai/cudf/pull/13196)) [@shwina](https://github.com/shwina)