Releases: apache/iceberg
Apache Iceberg 1.6.1
What's Changed
- Core: Limit ParallelIterable memory consumption by yielding in tasks by @raunaqmorarka in #10787
- [1.6] Core: Drop ParallelIterable's queue low water mark by @findepi in #10979
- Build: Bump orc from 1.9.3 to 1.9.4 (#10728) by @Fokko in #10988
New Contributors
- @raunaqmorarka made their first contribution in #10787
Full Changelog: apache-iceberg-1.6.0...apache-iceberg-1.6.1
Apache Iceberg 1.6.0
What's Changed
- API, Spark 3.3: Remove all usages of deprecated AssertHelpers by @findepi in #10500
- API: Fix default FileIO#newInputFile ManifestFile, DataFile and DeleteFile implementation to pass lengths by @amogh-jahagirdar in #9953
- AWS, Core: Replace .withFailMessage() usage with .as() by @nastra in #10000
- AWS: Fix TestGlueCatalogTable#testCreateTable by @aajisaka in #10221
- AWS: Make sure Signer + User Agent config are both applied by @nastra in #10198
- AWS: Retain Glue Catalog column comment after updating Iceberg table by @lawofcycles in #10276
- AWS: Retain Glue Catalog table description after updating Iceberg table by @aajisaka in #10199
- AWS: Support S3 DSSE-KMS encryption by @aajisaka in #8370
- AWS: close underlying executor for DynamoDb LockManager by @regadas in #10132
- Add 13 Dremio Blogs + Fix a few incorrect dates by @AlexMercedCoder in #9967
- Add EnumConfParser to SparkConfParser by @huaxingao in #10311
- Add Pagination To List Apis by @rahil-c in #9782
- Add bloom filter fpp config by @huaxingao in #10149
- Add checkstyle rule for uppercase constant fields by @attilakreiner in #10673
- Add issue template and docs for iceberg proposals by @danielcweeks in #9932
- Add local nightly build to test current docs changes by @bitsondatadev in #9943
- Add stale PRs management by @jbonofre in #10134
- Add support for providing output-spec-id during rewrite - spark 3.5 by @himadripal in #9803
- Address Intellij inspection findings by @snazy in #10583
- Allow Java 17 in contribute.md by @findepi in #10545
- Apply IntelliJ inspection findings to older Spark + Flink versions by @snazy in #10625
- Avoid adding a closed client to the pool by @flyrain in #10337
- Aws: Add Iceberg version to UserAgent in S3 requests by @CsengerG in #9963
- Backport Flink 1.18 JUnit5 migration to Flink 1.17 by @tomtongue in #10163
- Backport HadoopCatalog related classes in Flink by @tomtongue in #10620
- Backport source package changes in Flink to other versions by @tomtongue in #10663
- Basic manifest encryption by @ggershinsky in #8252
- Build: Align Jackson versions by @nastra in #9925
- Build: Bump Nessie to 0.90.4 by @adutra in #10492
- Build: Bump Nessie to 0.91.2 by @adutra in #10563
- Build: Bump Spark 3.5 to 3.5.1 by @manuzhang in #9832
- Build: Bump arrow from 15.0.0 to 15.0.1 by @dependabot in #9910
- Build: Bump arrow from 15.0.1 to 15.0.2 by @dependabot in #10034
- Build: Bump com.azure:azure-sdk-bom from 1.2.20 to 1.2.21 by @dependabot in #9857
- Build: Bump com.azure:azure-sdk-bom from 1.2.21 to 1.2.22 by @dependabot in #10071
- Build: Bump com.azure:azure-sdk-bom from 1.2.22 to 1.2.23 by @dependabot in #10238
- Build: Bump com.azure:azure-sdk-bom from 1.2.23 to 1.2.24 by @dependabot in #10420
- Build: Bump com.azure:azure-sdk-bom from 1.2.24 to 1.2.25 by @dependabot in #10652
- Build: Bump com.esotericsoftware:kryo from 4.0.2 to 4.0.3 by @dependabot in #9984
- Build: Bump com.google.cloud:libraries-bom from 26.28.0 to 26.43.0 by @dependabot in #10699
- Build: Bump com.google.errorprone:error_prone_annotations from 2.24.1 to 2.26.1 by @dependabot in #9972
- Build: Bump com.google.errorprone:error_prone_annotations from 2.26.1 to 2.27.0 by @dependabot in #10236
- Build: Bump com.google.errorprone:error_prone_annotations from 2.27.0 to 2.28.0 by @dependabot in #10418
- Build: Bump com.gorylenko.gradle-git-properties:gradle-git-properties from 2.4.1 to 2.4.2 by @dependabot in #10239
- Build: Bump com.palantir.gradle.gitversion:gradle-git-version from 3.0.0 to 3.1.0 by @dependabot in #10468
- Build: Bump datamodel-code-generator from 0.25.4 to 0.25.5 by @dependabot in #9979
- Build: Bump datamodel-code-generator from 0.25.5 to 0.25.6 by @dependabot in #10242
- Build: Bump datamodel-code-generator from 0.25.6 to 0.25.7 by @dependabot in #10507
- Build: Bump datamodel-code-generator from 0.25.7 to 0.25.8 by @dependabot in #10649
- Build: Bump gradle.plugin.io.morethan.jmhreport:gradle-jmh-report from 0.9.0 to 0.9.6 by @dependabot in #10193
- Build: Bump guava from 33.0.0-jre to 33.1.0-jre by @dependabot in #9977
- Build: Bump guava from 33.1.0-jre to 33.2.0-jre by @dependabot in #10271
- Build: Bump guava from 33.2.0-jre to 33.2.1-jre by @dependabot in #10414
- Build: Bump io.airlift:aircompressor from 0.26 to 0.27 by @dependabot in #10383
- Build: Bump io.delta:delta-spark_2.12 from 3.1.0 to 3.2.0 by @dependabot in #10320
- Build: Bump io.delta:delta-standalone_2.12 from 3.1.0 to 3.2.0 by @dependabot in #10321
- Build: Bump io.github.goooler.shadow:shadow-gradle-plugin from 8.1.7 to 8.1.8 by @dependabot in #10612
- Build: Bump io.netty:netty-buffer from 4.1.107.Final to 4.1.108.Final by @dependabot in #10032
- Build: Bump io.netty:netty-buffer from 4.1.108.Final to 4.1.109.Final by @dependabot in #10191
- Build: Bump io.netty:netty-buffer from 4.1.109.Final to 4.1.110.Final by @dependabot in #10384
- Build: Bump io.netty:netty-buffer from 4.1.110.Final to 4.1.111.Final by @dependabot in #10504
- Build: Bump jetty from 9.4.53.v20231009 to 9.4.54.v20240208 by @dependabot in #9982
- Build: Bump jetty from 9.4.54.v20240208 to 9.4.55.v20240627 by @dependabot in #10654
- Build: Bump kafka from 3.6.1 to 3.7.0 by @dependabot in #9855
- Build: Bump kafka from 3.7.0 to 3.7.1 by @dependabot in #10653
- Build: Bump mkdocs-material from 9.5.14 to 9.5.15 by @dependabot in #10031
- Build: Bump mkdocs-material from 9.5.15 to 9.5.17 by @dependabot in #10092
- Build: Bump mkdocs-material from 9.5.17 to 9.5.18 by @dependabot in #10189
- Build: Bump mkdocs-material from 9.5.18 to 9.5.19 by @dependabot in #10241
- Build: Bump mkdocs-material from 9.5.19 to 9.5.21 by @dependabot in #10272
- Build: Bump mkdocs-material from 9.5.21 to 9.5.23 by @dependabot in #10353
- Build: Bump mkdocs-material from 9.5.23 to 9.5.25 by @dependabot in #10413
- Build: Bump mkdocs-material from 9.5.25 to 9.5.26 by @dependabot in #10464
- Build: Bump mkdocs-material from 9.5.26 to 9.5.27 by @dependabot in #10555
- Build: Bump mkdocs-material from 9.5.27 to 9.5.28 by @dependabot in #10648
- Build: Bump mkdocs-material from 9.5.9 to 9.5.14 by @dependabot in #9983
- Build: Bump nessie from 0.77.1 to 0.79.0 by @dependabot in #9976
- Build: Bump nessie from 0.79.0 to 0.80.0 by @dependabot in #10237
- Build: Bump nessie from 0.80.0 to 0.81.1 by @dependabot in #10267
- Build: Bump nessie from 0.81.1 to 0.82.0 by @dependabot in #10318
- Build: Bump nessie from 0.82.0 to 0.83.2 by @dependabot in #10381
- Build: Bump nessie from 0.90.4 to 0.91.1 by @dependabot in #10551
- Build: Bump nessie from 0.91.2 to 0.91.3 by @dependabot in #10608
- Build: Bump nessie from 0.92.0 to 0.92.1 by @dependabot in #10697
- Build: Bump net.snowflake:snow...
Apache Iceberg 1.5.2
The 1.5.2 release has the same changes that the 1.5.1 release has. The 1.5.1 release had issues with the spark runtime artifacts; specifically certain artifacts were built with the wrong Scala version. It is strongly recommended to upgrade to 1.5.2 for any systems that are using 1.5.1.
Apache Iceberg 1.5.1
What's Changed
- [1.5.x] API: Fix default FileIO#newInputFile ManifestFile, DataFile and DeleteFile implementations by @amogh-jahagirdar in #10114
- [1.5.x] Core: Mark 502 and 504 failures as retryable to the exponential retry strategy by @amogh-jahagirdar in #10113
- Core: Fix JDBC Catalog table commit when migrating from schema V0 to V1 (#101111) by @jbonofre in #10152
- Core: Fix namespace SQL statement using ESCAPE character that works with MySQL/PostgreSQL (#10167) by @jbonofre in #10169
- (1.5.x cherry-pick) Spark 3.5: Fix system function pushdown in CoW row-level commands by @amogh-jahagirdar in #10170
- (1.5.x Cherry-pick) Spark 3.4: Fix system function pushdown in CoW row-level commands (#10119) by @amogh-jahagirdar in #10171
Full Changelog: apache-iceberg-1.5.0...apache-iceberg-1.5.1
Apache Iceberg 1.5.0
Apache Iceberg 1.5.0 was released on March 11, 2024.
The 1.5.0 release adds a variety of new features and bug fixes.
- API
- Core
- Add view support for REST catalog (#7913)
- Add view support for JDBC catalog (#9487)
- Add catalog type for glue,jdbc,nessie (#9647)
- Support Avro file encryption with AES GCM streams (#9436)
- Add ApplyNameMapping for Avro (#9347)
- Add StandardEncryptionManager (#9277)
- Add REST catalog table session cache (#8920)
- Support view metadata compression (#8552)
- Track partition statistics in TableMetadata (#8502)
- Enable column statistics filtering after planning (#8803)
- Spark
- Remove support for Spark 3.2 (#9295)
- Support views via SQL for Spark 3.4 and 3.5 (#9423, #9421, #9343, #9513, #9582)
- Support executor cache locality (#9563)
- Added support for delete manifest rewrites (#9020)
- Support encrypted output files (#9435)
- Add Spark UI metrics from Iceberg scan metrics (#8717)
- Parallelize reading files in add_files procedure (#9274)
- Support file and partition delete granularity (#9384)
- Flink
- Parquet
- Kafka-Connect
- Spec
- Vendor Integrations
- AWS: Support setting description for Glue table (#9530)
- AWS: Update S3FileIO test to run when CLIENT_FACTORY is not set (#9541)
- AWS: Add S3 Access Grants Integration (#9385)
- AWS: Glue catalog strip trailing slash on DB URI (#8870)
- Azure: Add FileIO that supports ADLSv2 storage (#8303)
- Azure: Make ADLSFileIO implement DelegateFileIO (#8563)
- Nessie: Support views for NessieCatalog (#8909)
- Nessie: Strip trailing slash for warehouse location (#9415)
- Nessie: Infer default API version from URI (#9459)
- Dependencies
- Bump Nessie to 0.77.1
- Bump ORC to 1.9.2
- Bump Arrow to 15.0.0
- Bump AWS Java SDK to 2.24.5
- Bump Azure Java SDK to 1.2.20
- Bump Google cloud libraries to 26.28.0
Note:
- To enable view support for JDBC catalog, configure
jdbc.schema-version
toV1
in catalog properties.
New Contributors
- @reswqa made their first contribution in #7745
- @maxdebayser made their first contribution in #7796
- @mderoy made their first contribution in #7801
- @cxzl25 made their first contribution in #7825
- @tilman151 made their first contribution in #7781
- @TaoZex made their first contribution in #7761
- @Rondiz made their first contribution in #7829
- @grobgl made their first contribution in #7645
- @guiyanakuang made their first contribution in #7839
- @littlecatjianjiao made their first contribution in #7908
- @DaVincii made their first contribution in #7874
- @mumuhhh made their first contribution in #7866
- @Ewan-Keith made their first contribution in #7917
- @nikam14 made their first contribution in #7093
- @hsiang-c made their first contribution in #7920
- @ktk1012 made their first contribution in #8026
- @joan38 made their first contribution in #8002
- @coded9 made their first contribution in #8058
- @rustyconover made their first contribution in #8074
- @mr-brobot made their first contribution in #8061
- @Neuw84 made their first contribution in #7988
- @lintingbin made their first contribution in #8111
- @mrcnc made their first contribution in #8193
- @s-akhtar-baig made their first contribution in #8205
- @MaxNevermind made their first contribution in #7694
- @bmaisonn made their first contribution in #8209
- @HonahX made their first contribution in #8215
- @onerishabh made their first contribution in #8214
- @kengtin made their first contribution in #7161
- @aless10 made their first contribution in #8286
- @advancedxy made their first contribution in #8320
- @dacort made their first contribution in #8341
- @gegef2009 made their first contribution in #8154
- @TjuAachen made their first contribution in #8401
- @baiyangtx made their first contribution in #8416
- @hiteshbedre made their first contribution in #8491
- @harshm-dev made their first contribution in #8385
- @wForget made their first contribution in #8445
- @andreacfm made their first contribution in #8528
- @Paddy0523 made their first contribution in #8547
- @rushilshah1 made their first contribution in #8589
- @lanemoseley made their first contribution in #8618
- @tlm365 made their first contribution in #8447
- @jbonofre made their first contribution in #8612
- @jayceslesar made their first contribution in #8558
- @MehulBatra made their first contribution in #8408
- @clettieri made their first contribution in #8192
- @nk1506 made their first contribution in #8640
- @johanhenriksson made their first contribution in #8751
- @ashutosh-roy made their first contribution in #8707
- @Priyansh121096 made their first contribution in #8748
- @PickBas made their first contribution in #8819
- @jongwooo made their first contribution in #8666
- @rice668 made their first contribution in #8873
- @geruh made their first contribution in #8914
- @bknbkn made their first contribution in #8868
- @wangtaohz made their ...
Apache Iceberg 1.4.3
What's Changed
- Core: Scan only live entries in partitions table (#8969) by @Fokko in #9197
- [1.4.x] Core: Fix missing files from transaction retries with conflicting manifest merges (#9230) by @nastra in #9337
- [1.4.x] JDBC Catalog: Fix namespaceExists check with special characters (#8340) by @ismailsimsek in #9291
- [1.4.x] Core: Expired Snapshot files in a transaction should be deleted by @bartash in #9223
- [1.4.x] Core: Fix missing delete files from transaction (#9354) by @nastra in #9356
Full Changelog: apache-iceberg-1.4.2...apache-iceberg-1.4.3
Apache Iceberg 1.4.2
What's Changed
- Core: Ignore split offsets array when split offset is past file length by @amogh-jahagirdar in #8938
Full Changelog: apache-iceberg-1.4.1...apache-iceberg-1.4.2
Apache Iceberg 1.4.1
What's Changed
- Core: Do not use a lazy split offset list in manifests (#8834) by @nastra in #8845
- Core: Ignore split offsets when the last split offset is past the file length by @amogh-jahagirdar in #8861
- AWS: avoid static global credentials provider which doesn't play well with lifecycle management (#8677) by @nastra in #8843
- Flink: Reverting the default custom partitioner for bucket column (#8848) by @nastra in #8858
Full Changelog: apache-iceberg-1.4.0...apache-iceberg-1.4.1
Apache Iceberg 1.4.0
- API
- Core
- Use V2 format by default in new tables (#8381)
- Use
zstd
compression for Parquet by default in new tables (#8593) - Add strict metadata cleanup mode and enable it by default (#8397) (#8599)
- Avoid generating huge manifests during commits (#6335)
- Add a writer for unordered position deletes (#7692)
- Optimize
DeleteFileIndex
(#8157) - Optimize lookup in
DeleteFileIndex
without useful bounds (#8278) - Optimize split offsets handling (#8336)
- Optimize computing user-facing state in data tasks (#8346)
- Don't persist useless file and position bounds for deletes (#8360)
- Don't persist counts for paths and positions in position delete files (#8590)
- Support setting system-level properties via environmental variables (#5659)
- Add JSON parser for
ContentFile
andFileScanTask
(#6934) - Add REST spec and request for commits to multiple tables (#7741)
- Add REST API for committing changes against multiple tables (#7569)
- Default to exponential retry strategy in REST client (#8366)
- Support registering tables with REST session catalog (#6512)
- Add last updated timestamp and snapshot ID to partitions metadata table (#7581)
- Add total data size to partitions metadata table (#7920)
- Extend
ResolvingFileIO
to support bulk operations (#7976) - Key metadata in Avro format (#6450)
- Add AES GCM encryption stream (#3231)
- Fix a connection leak in streaming delete filters (#8132)
- Fix lazy snapshot loading history (#8470)
- Fix unicode handling in HTTPClient (#8046)
- Fix paths for unpartitioned specs in writers (#7685)
- Fix OOM caused by Avro decoder caching (#7791)
- Spark
- Added support for Spark 3.5
- Code for DELETE, UPDATE, and MERGE commands has moved to Spark, and all related extensions have been dropped from Iceberg.
- Support for WHEN NOT MATCHED BY SOURCE clause in MERGE.
- Column pruning in merge-on-read operations.
- Ability to request a bigger advisory partition size for the final write to produce well-sized output files without harming the job parallelism.
- Dropped support for Spark 3.1
- Deprecated support for Spark 3.2
- Support vectorized reads for merge-on-read operations in Spark 3.4 and 3.5 (#8466)
- Increase default advisory partition size for writes in Spark 3.5 (#8660)
- Support distributed planning in Spark 3.4 and 3.5 (#8123)
- Support pushing down system functions by V2 filters in Spark 3.4 and 3.5 (#7886)
- Support fanout position delta writers in Spark 3.4 and 3.5 (#7703)
- Use fanout writers for unsorted tables by default in Spark 3.5 (#8621)
- Support multiple shuffle partitions per file in compaction in Spark 3.4 and 3.5 (#7897)
- Output net changes across snapshots for carryover rows in CDC (#7326)
- Display read metrics on Spark SQL UI (#7447) (#8445)
- Adjust split size to benefit from cluster parallelism in Spark 3.4 and 3.5 (#7714)
- Add
fast_forward
procedure (#8081) - Support filters when rewriting position deletes (#7582)
- Support setting current snapshot with ref (#8163)
- Make backup table name configurable during migration (#8227)
- Add write and SQL options to override compression config (#8313)
- Correct partition transform functions to match the spec (#8192)
- Enable extra commit properties with metadata delete (#7649)
- Added support for Spark 3.5
- Flink
- Add possibility of ordering the splits based on the file sequence number (#7661)
- Fix serialization in
TableSink
with anonymous object (#7866) - Switch to
FileScanTaskParser
for JSON serialization ofIcebergSourceSplit
(#7978) - Custom partitioner for bucket partitions (#7161)
- Implement data statistics coordinator to aggregate data statistics from operator subtasks (#7360)
- Support alter table column (#7628)
- Parquet
- ORC
- Handle filters with transforms by assuming the filter matches (#8244)
- Vendor Integrations
- GCP: Fix single byte read in
GCSInputStream
(#8071) - GCP: Add properties for OAtuh2 and update library (#8073)
- GCP: Add prefix and bulk operations to
GCSFileIO
(#8168) - GCP: Add bundle jar for GCP-related dependencies (#8231)
- GCP: Add range reads to
GCSInputStream
(#8301) - AWS: Add bundle jar for AWS-related dependencies (#8261)
- AWS: support config storage class for
S3FileIO
(#8154) - AWS: Add
FileIO
tracker/closer to Glue catalog (#8315) - AWS: Update S3 signer spec to allow an optional string body in
S3SignRequest
(#8361) - Azure: Add
FileIO
that supports ADLSv2 storage (#8303) - Azure: Make
ADLSFileIO
implementDelegateFileIO
(#8563) - Nessie: Provide better commit message on table registration (#8385)
- GCP: Fix single byte read in
- Dependencies
- Bump Nessie to 0.71.0
- Bump ORC to 1.9.1
- Bump Arrow to 12.0.1
- Bump AWS Java SDK to 2.20.131
Apache Iceberg 1.3.1
What's Changed
- Hive: Set commit state as Unknown before throwing CommitStateUnknownException by @nastra in #8029
- Spark 3.4: WAP branch not propagated when using DELETE without WHERE by @nastra in #8028
- Core: Include all reachable snapshots with v1 format and REF snapshot mode by @nastra in #8027
- Spark 3.3: Backport 'WAP branch not propagated when using DELETE without WHERE' by @nastra in #8036
- Flink: Remove the creation of default database in FlinkCatalog by @Fokko in #8039
- Core: Handle optional fields by @Fokko in #8064
- Core: Abort file groups should be under same lock as committerService by @ConeyLiu in #8060
- Spark 3.3: Fix rewrite_position_deletes for certain partition types by @szehon-ho in #8069
- Spark 3.4: Fix rewrite_position_deletes for certain partition types by @szehon-ho in #8059
Full Changelog: apache-iceberg-1.3.0...apache-iceberg-1.3.1