Releases: Eventual-Inc/Daft
Releases · Eventual-Inc/Daft
v0.2.24
Changes
✨ New Features
- [FEAT] Allow returning of pyarrow arrays from UDFs @jaychia (#2252)
- [FEAT] Add left, right, and outer joins @kevinzwang (#2166)
- [FEAT] Add rpad and lpad expressions @murex971 (#2157)
- [FEAT] AWS Profile override in S3Config @samster25 (#2243)
- [FEAT] Add unpivot @kevinzwang (#2204)
- [FEAT] Add string repeat functionality @murex971 (#2198)
- [FEAT] Approximate quantile aggregation (pulled into main) @jaychia (#2179)
- [FEAT] pivot @colin-ho (#2183)
🚀 Performance Improvements
- [PERF] Adaptive Query Execution @samster25 (#2176)
- [PERF]: swap out json_deserializer for simd_json @universalmind303 (#2228)
- [PERF] Evaluate only true/false side of if_else if predicate is boolean @colin-ho (#2222)
- [PERF] enable metadata preservation across materialization points @samster25 (#2216)
👾 Bug Fixes
- [BUG] Fix tab completion on expression namespaced accessors @jaychia (#2251)
- [BUG] route abfss to AzureBlob @samster25 (#2244)
📖 Documentation
- [CHORE] Skip demo notebook @jaychia (#2248)
- [FEAT] Add rpad and lpad expressions @murex971 (#2157)
- [DOCS] Add user guide for read_sql @colin-ho (#2226)
- [FEAT] Add unpivot @kevinzwang (#2204)
- [DOCS] Add
read_hudi
in the api docs @xushiyan (#2225) - [FEAT] Add string repeat functionality @murex971 (#2198)
- [DOCS] LinkedIn Big Data meetup tutorial @jaychia (#2223)
- [FEAT] Approximate quantile aggregation (pulled into main) @jaychia (#2179)
- [DOCS] Add read_lance docs @jaychia (#2218)
- [FEAT] pivot @colin-ho (#2183)
🧰 Maintenance
- [CHORE] Drop Python 3.7 @samster25 (#2250)
- [CHORE] Improve timestamp repr @colin-ho (#2245)
- [CHORE] Allow multiple group_bys for pivot @colin-ho (#2242)
- [CHORE] Skip demo notebook @jaychia (#2248)
- [CHORE] Return &str for expression name @colin-ho (#2224)
- [CHORE] Mount provision.py for iceberg integration tests @jaychia (#2232)
- [CHORE]: remove trait aliases @universalmind303 (#2229)
⬆️ Dependencies
- Bump serde from 1.0.198 to 1.0.200 @dependabot (#2239)
- Bump csv-async from 1.2.6 to 1.3.0 @dependabot (#2238)
v0.2.23
Changes
✨ New Features
👾 Bug Fixes
- [BUG] Propagate errors when hitting them in parquet byte stream @samster25 (#2214)
📖 Documentation
🧰 Maintenance
- [CHORE]: statically link liblzma @universalmind303 (#2213)
v0.2.22
Changes
This is the last release that will support Python 3.7 which has been EOL for about a year now.
✨ New Features
- [FEAT] Rust side exceptions for Transient Errors @samster25 (#2197)
- [FEAT] Enable anonymous S3 access for Delta @jaychia (#2206)
- [FEAT] Improve Hudi support for more scenarios @xushiyan (#2149)
- [FEAT] try python 3.11 for releases @samster25 (#2184)
- [FEAT] Timestamp Truncation @colin-ho (#2158)
- [FEAT] Enhance temporal arithmetic functionalities @colin-ho (#2146)
- [FEAT] Add logarithmic expressions @murex971 (#2168)
- [PERF] Move
with_column
andexclude
function logic to Rust side, addwith_columns
@kevinzwang (#2167) - [FEAT] Allow for variadic kwargs in UDFs @jaychia (#2162)
- [FEAT] hide stack traces of wrappers for pytest / ipython @samster25 (#2159)
- [FEAT] Improve query building in read_sql @colin-ho (#2144)
🚀 Performance Improvements
- [PERF] Move
with_column
andexclude
function logic to Rust side, addwith_columns
@kevinzwang (#2167) - [PERF] Refactor TreeNode to be native to Arc<TreeNode> @samster25 (#2175)
👾 Bug Fixes
- [FEAT] Improve Hudi support for more scenarios @xushiyan (#2149)
- [BUG] bump arrow2 to use copy ptr instead of from_raw_parts @samster25 (#2194)
- [BUG] Fix empty inputs case for string kernels @jaychia (#2165)
- [BUG] Fix tuple inputs in UDF @jaychia (#2161)
📖 Documentation
- [DOCS] Add Hudi integration entry @xushiyan (#2208)
- [FEAT] Improve Hudi support for more scenarios @xushiyan (#2149)
- [FEAT] Timestamp Truncation @colin-ho (#2158)
- [FEAT] Add logarithmic expressions @murex971 (#2168)
- [DOCS] add Dask migration guide @avriiil (#2169)
- [CHORE] enable codespell and fix mispellings @samster25 (#2177)
- [PERF] Move
with_column
andexclude
function logic to Rust side, addwith_columns
@kevinzwang (#2167)
🧰 Maintenance
- [CHORE] Upgrade Rust toolchain to 2024 04 01 @samster25 (#2192)
- [CHORE] upgrade dask again for publish pipeline @samster25 (#2193)
- [CHORE] remove fsspec http filesystem @samster25 (#2191)
- [CHORE] add ray min version for windows to remove upper pin on pyarrow @samster25 (#2189)
- [CHORE] upgrade dask for tests for 3.11 @samster25 (#2186)
- [CHORE] disable macos check for testing in publish pipeline @samster25 (#2185)
- [CHORE] upgrade publish pipeline python version to 3.12 @samster25 (#2182)
- [CHORE] use python 3.9 for publishing @samster25 (#2181)
- [CHORE] enable codespell and fix mispellings @samster25 (#2177)
- [CHORE] Add usr msg for df.explain(show_all=True) @avriiil (#2081)
- [CHORE] improve recursive listing error msg @avriiil (#2145)
- [CHORE] Enforce physical types in logical array @jaychia (#2160)
- [CHORE] Fix style build on main @jaychia (#2163)
- [CHORE] unify expr children around expr ref @samster25 (#2156)
- [CHORE] Rearrange modules in
daft-plan
crate @samster25 (#2151) - [CHORE] collect all tests before running pytest so check for import errors @samster25 (#2150)
- [CHORE] Bottom Up Logical To Physical translation @samster25 (#2147)
⬆️ Dependencies
5 changes
- Bump comfy-table from 7.1.0 to 7.1.1 @dependabot (#2199)
- Bump num-traits from 0.2.17 to 0.2.18 @dependabot (#2200)
- Bump slackapi/slack-github-action from 1.25.0 to 1.26.0 @dependabot (#2174)
- Bump regex from 1.10.3 to 1.10.4 @dependabot (#2172)
- Bump serde_json from 1.0.108 to 1.0.116 @dependabot (#2173)
v0.2.21
Changes
✨ New Features
- [FEAT] Add S3Config.from_env functionality @jaychia (#2137)
- deltalake _delta_lake.py: Allow Glue catalog cross account access @pang-wu (#2113)
- [FEAT] Enable Ruff @samster25 (#2121)
- [FEAT] Implements other trigonometry expressions @MeepoWin (#2123)
- [FEAT] exp expression implementation @MeepoWin (#2115)
- [FEAT] sin/cos/tan expression implementation @reswqa (#2112)
- [CHORE] Using
uv
in MakeFile @MeepoWin (#2114) - [FEAT] Add option to S3Config to force virtual addressing @samster25 (#2106)
- [FEAT] fill_null expression @colin-ho (#2089)
- [FEAT] Add basic list aggregations @kevinzwang (#2032)
- [FEAT] Allow sql alchemy connection factory as input to read_sql @colin-ho (#2071)
- [FEAT] Add daft-sketch subcrate and arrow2 serialization functionality @jaychia (#2090)
👾 Bug Fixes
📖 Documentation
- [CHORE] Fix underlines in README @jaychia (#2143)
- [DOCS] Update iceberg integration docs to add writes @jaychia (#2110)
- [DOCS] Create CODE_OF_CONDUCT.md @samster25 (#2101)
- [CHORE] Skip deltalake notebooks for CI @jaychia (#2097)
- [CHORE] Add link to good first issues in readme @colin-ho (#2088)
- [DOCS] Fix docs typo @avriiil (#2075)
- [DOCS] Typos in user guide @avriiil (#2079)
- [DOCS] Fix typos on 10-min tutorial @avriiil (#2082)
- [DOCS] Add ml batch inference tutorials @jaychia (#2057)
- [CHORE] Fix autolabeller CI step for forks @jaychia (#2138)
🧰 Maintenance
- [CHORE] Fix underlines in README @jaychia (#2143)
- [CHORE] Split labelling and update release CI steps @jaychia (#2142)
- [CHORE] Fix the labeller CI step which is not triggering @jaychia (#2141)
- [CHORE] Fixing readthedocs build @jaychia (#2135)
- [CHORE] Fix documentation build with uv @jaychia (#2134)
- [CHORE] Fix build command @MeepoWin (#2126)
- [CHORE] Rename virtual env folder to
.venv
@MeepoWin (#2122) - [CHORE] refactors for ruff [1/n] @samster25 (#2120)
- [CHORE] FunctionExpr and exp @samster25 (#2119)
- [CHORE] FunctionEvaluator directly receive
FunctionExpr
@MeepoWin (#2117) - [CHORE] Update .gitignore for JetBrains IDE and pyenv user @MeepoWin (#2116)
- [CHORE] Refactor string kernels @colin-ho (#2087)
- [CHORE] Skip deltalake notebooks for CI @jaychia (#2097)
- [CHORE] Add link to good first issues in readme @colin-ho (#2088)
- [CHORE] fix empty data and pattern case in str expressions @murex971 (#2085)
⬆️ Dependencies
5 changes
- Bump bytes from 1.5.0 to 1.6.0 @dependabot (#2131)
- Bump futures from 0.3.28 to 0.3.30 @dependabot (#2130)
- Bump isbang/compose-action from 1.5.1 to 2.0.0 @dependabot (#2091)
- Bump dyn-clone from 1.0.16 to 1.0.17 @dependabot (#2093)
- Bump tokio from 1.33.0 to 1.37.0 @dependabot (#2092)
v0.2.20
Changes
✨ New Features
- [FEAT] improve error message for s3 streaming error @samster25 (#2055)
- [FEAT] Add str.replace expression @colin-ho (#2048)
- [FEAT] Enable str.split using regex pattern @colin-ho (#2044)
- [FEAT] Support Hudi reader @xushiyan (#2011)
- [FEAT] Add find functionality for string @murex971 (#2046)
- [FEAT] round expression implemtation @sherlockbeard (#2041)
- [FEAT] Add str.extract_all expression @colin-ho (#2038)
- [FEAT] Add str.right() function @murex971 (#2031)
- [FEAT] Sign expression implemtation @sherlockbeard (#2037)
- [FEAT] drop psutil in favor of our own tool @samster25 (#2035)
- [FEAT] Allow passing on_error="null" to ignore decoding errors in image decode @jaychia (#2033)
- [FEAT] Add str.extract() function @colin-ho (#2020)
- [FEAT] Add str.left() funtion @murex971 (#2027)
🚀 Performance Improvements
- [PERF] [Delta Lake] Add IO multithreading arg to
daft.read_delta_lake()
. @clarkzinzow (#2029)
👾 Bug Fixes
- [BUG] Allow for writes to s3a and s3n paths @jaychia (#2054)
- [BUG] Fix if_else series naming from predicate broadcast @colin-ho (#2051)
- [BUG] enable dependabot for iceberg int tests @samster25 (#2042)
- [BUG] Fix all-null ImageArray length issues @jaychia (#2034)
- [BUG] produce only a single sdist @samster25 (#2078)
📖 Documentation
- [FEAT] Add str.replace expression @colin-ho (#2048)
- [DOCS] Add docs for write_iceberg @jaychia (#2053)
- [FEAT] Add str.extract_all expression @colin-ho (#2038)
- [FEAT] Add str.extract() function @colin-ho (#2020)
- [CHORE] Add global aggregation docs and error on improper aggregation usage @kevinzwang (#2025)
- [DOCS] Fix typos and broken links @kaytsui (#2052)
🧰 Maintenance
- [CHORE] Remove autouse from gen_tpch fixture @colin-ho (#2049)
- [CHORE] Refactor sql tpch tests @colin-ho (#2047)
- [CHORE] Add tpch test for read sql @colin-ho (#2026)
- [CHORE] Add column range stats from read_sql @colin-ho (#2015)
- [CHORE] upgrade upload/download artifact github action @samster25 (#2043)
- [CHORE] Exclude Twitter and LinkedIn from broken link checker @colin-ho (#2030)
- [CHORE] Add global aggregation docs and error on improper aggregation usage @kevinzwang (#2025)
- [CHORE] Use docker compose instead of docker-compose in builds @jaychia (#2077)
⬆️ Dependencies
10 changes
- Bump release-drafter/release-drafter from 5 to 6 @dependabot (#2065)
- Bump nick-fields/retry from 2 to 3 @dependabot (#2066)
- Bump openssl-sys from 0.9.93 to 0.9.102 @dependabot (#2062)
- Bump async-trait from 0.1.74 to 0.1.79 @dependabot (#2061)
- Bump base64 from 0.21.5 to 0.22.0 @dependabot (#2059)
- Bump async-compression from 0.4.5 to 0.4.7 @dependabot (#2060)
- Bump lxml from 4.9.3 to 5.1.0 @dependabot (#1764)
- Bump slackapi/slack-github-action from 1.24.0 to 1.25.0 @dependabot (#1822)
- Bump actions/setup-python from 4 to 5 @dependabot (#1717)
- Bump conda-incubator/setup-miniconda from 2 to 3 @dependabot (#1666)
v0.2.19
Changes
✨ New Features
- [FEAT] iceberg writes unpartitioned @samster25 (#2016)
- [FEAT] Add str.match() function @colin-ho (#2007)
- [FEAT] read_sql @colin-ho (#1943)
👾 Bug Fixes
- [BUG] Fix connector-x and psycopg dependencies for CI @colin-ho (#2017)
- [BUG] Disable Numeric and String comparison @samster25 (#2019)
- [BUG] deltalake read pq splitting bug @jaychia (#2013)
📖 Documentation
- [DOCS] [Hotfix] [Delta Lake] Break data skipping optimizations into different section. @clarkzinzow (#2018)
- [CHORE] Fix is_in docs @colin-ho (#2022)
- [FEAT] Add str.match() function @colin-ho (#2007)
- [FEAT] read_sql @colin-ho (#1943)
🧰 Maintenance
v0.2.18
Changes
✨ New Features
- [FEAT] Top level global expressions @kevinzwang (#2000)
- [FEAT] Add str.capitalize() function @murex971 (#2003)
- [FEAT] Support reading Parquet files with Field ID @jaychia (#1990)
- [FEAT] Enable JQ style JSON accessors on strings @colin-ho (#2001)
- [FEAT] [Catalogs] [Delta Lake] Add support for AWS Glue Catalog and Databricks Unity Catalog integrations to Delta Lake reader @clarkzinzow (#1991)
- [FEAT] Enable UDF to handle arbitrary number of Daft series @gmweaver (#1984)
👾 Bug Fixes
- [BUG] skip metadata check for field equality @samster25 (#2006)
- [BUG] Fix struct getters on logical types @jaychia (#2008)
- [BUG] Filter out marker files from glob scan @colin-ho (#1999)
📖 Documentation
- [DOCS] Fix stale docs in df.write_csv @jaychia (#2012)
- [FEAT] Enable JQ style JSON accessors on strings @colin-ho (#2001)
🧰 Maintenance
- [CHORE] [Hotfix] Remove pyarrow upper bound for Windows. @clarkzinzow (#2002)
- [CHORE] [Catalogs] [Delta Lake] Add test coverage for Delta Lake reads on Azure. @clarkzinzow (#1970)
- [CHORE] [Repartitioning] Refactor + hide
PartitionSpec
and rename toClusteringSpec
. @clarkzinzow (#1961) - [CHORE] Simplify cast to schema @jaychia (#1982)
- [CHORE] Disables anonymous mode for S3 accesses in DeltaLake @jaychia (#1975)
- [CHORE] Set DAFT_ANALYTICS_ENABLED=0 in nightly tests @jaychia (#1972)
v0.2.17
Changes
✨ New Features
- [FEAT] Add str.reverse() function @nsalerni (#1957)
- [FEAT] Add str.lower() function @nsalerni (#1938)
- [FEAT] MapArray @colin-ho (#1959)
- [FEAT]
any_value
groupby aggregation @kevinzwang (#1941) - [FEAT] adding floor function @chandbud5 (#1960)
- [FEAT] Expose
coerce_int96_timestamp_unit
flag on top leveldaft.read_parquet
call @samster25 (#1936) - [FEAT] Time Array @colin-ho (#1892)
- [FEAT] Add str.lstrip() and str.rstrip() functions @nsalerni (#1944)
- [FEAT] Add str.upper() function @nsalerni (#1942)
🚀 Performance Improvements
- [PERF] scan task in memory estimate @samster25 (#1901)
- [PERF] Spread scan tasks over Ray cluster. @clarkzinzow (#1950)
📖 Documentation
- [DOCS] [Delta Lake] Add user guide for Delta Lake reads. @clarkzinzow (#1969)
- [Catalogs] [Delta Lake] Add initial support for reading from Delta Lake. @clarkzinzow (#1879)
- [DOCS] Fix notebooks by falling back on null for URL downloads @jaychia (#1951)
- [DOCS] Add documentation for using and developing Daft on Ray @kevinzwang (#1896)
- [DOCS] Update schema hints documentation @jaychia (#1935)
🧰 Maintenance
v0.2.16
Changes
✨ New Features
- [FEAT] perform head operation instead of list when given a file without regex or / @samster25 (#1891)
🚀 Performance Improvements
- [PERF] Parallel glob @samster25 (#1897)
v0.2.15
Changes
👾 Bug Fixes
- [BUG] dont create dirs if non local fs @samster25 (#1888)
- [BUG] Fix Ray autoscaling from zero worker CPUs @kevinzwang (#1884)
- [BUG] Attempt to skip IMDS if region or credentials are provided @samster25 (#1886)
- [BUG] [Query Planner] Properly track ascending/descending sort order for range partitioning and sorting. @clarkzinzow (#1862)
- [BUG] Fix bug with merge tasks that allows for tasks larger than max size allowed @samster25 (#1882)