Releases: Eventual-Inc/Daft
Releases · Eventual-Inc/Daft
v0.3.10
Changes
✨ New Features
- [FEAT] Overwrite mode for write parquet/csv @colin-ho (#3108)
- [FEAT] Support null equal safe join in SQL @advancedxy (#3166)
- [FEAT] Streaming Catalog Writes @colin-ho (#3160)
- [FEAT] Infer Azure storage account from uri @kevinzwang (#3165)
- [FEAT] Support null safe equal in joins @advancedxy (#3161)
- [FEAT] Support hive partitioned reads @desmondcheongzx (#3029)
- [FEAT] Add better detection of Ray Job environment @jaychia (#3148)
- [FEAT] Streaming physical writes for native executor @colin-ho (#2992)
- [FEAT]: Throw error for invalid ** usage outside folder segments (e.g. /tmp/**.csv) @conradsoon (#3100)
- [FEAT]: sql concat and stddev @universalmind303 (#3153)
- [FEAT]: Sql common table expressions (CTE's) @universalmind303 (#3137)
- [FEAT] enable decimal between @samster25 (#3154)
- [FEAT] dec128 math @samster25 (#3143)
- [FEAT] Support SQL
INTERVAL
@austin362667 (#3146) - [FEAT] Swordfish Stateful UDF support @kevinzwang (#3127)
- [FEAT]: sql cross join @universalmind303 (#3110)
- [FEAT] Add floor division @ConeyLiu (#3064)
- [FEAT] Compute pool for native executor @colin-ho (#2986)
🚀 Performance Improvements
- [PERF] Add a parallel local CSV reader @desmondcheongzx (#3055)
👾 Bug Fixes
- [BUG]: Sql groupby and orderby with aliases and projections @universalmind303 (#3177)
- [BUG] Separate PartitionTask done from results @jaychia (#3155)
- [BUG]: between panic on unsupported types @universalmind303 (#3150)
- [BUG] fix type widening for rem @samster25 (#3131)
📖 Documentation
- Temporal docs added to expressions.rst @sunaysanghani (#2487)
- [DOCS] Update banner on README.rst @ccmao1130 (#3130)
- [DOCS] Update Daft logo @ccmao1130 (#3129)
🧰 Maintenance
- [CHORE] Add tests for decimal casting @desmondcheongzx (#3179)
- [CHORE] Refactor RayRunner so that we can add tracing @jaychia (#3163)
- [CHORE] Swordfish specific test fixtures @colin-ho (#3164)
- [CHORE]: tpc-ds datagen @universalmind303 (#3103)
- [CHORE] Cancel tasks spawned on compute runtime @colin-ho (#3128)
- [CHORE] Enable debug in test profile @advancedxy (#3135)
- [FEATURE] add min_hash alternate hashers @andrewgazelka (#3052)
- [CHORE] (Revert:) Add rust cache to s3 build artifacts action @jaychia (#3147)
- [CHORE] Add rust cache to s3 build artifacts action @jaychia (#3144)
- [CHORE] Refactor shuffles to use a unified ShuffleExchange PhysicalPlan variant @jaychia (#3083)
⬆️ Dependencies
4 changes
- Bump orjson from 3.9.5 to 3.10.11 @dependabot (#3176)
- Bump adlfs from 2023.10.0 to 2024.7.0 @dependabot (#2547)
- Bump image from 0.24.9 to 0.25.4 @dependabot (#3088)
- Bump slackapi/slack-github-action from 1.26.0 to 1.27.0 @dependabot (#2776)
v0.3.9
Changes
✨ New Features
- [FEAT]: sql
IN
operator @universalmind303 (#3086) - [FEAT] Enable explode for swordfish @colin-ho (#3077)
- [FEAT]: add sql DISTINCT @universalmind303 (#3087)
- [FEAT] Enable concat for swordfish @colin-ho (#2976)
- [FEAT] Enable unpivot for swordfish @colin-ho (#3078)
- [FEAT] Outer joins for native executor @colin-ho (#2860)
- [FEAT] Enable pivot for swordfish @colin-ho (#3081)
- [FEAT] Enable sample for swordfish @colin-ho (#3079)
- [FEAT] Add stateful actor context and set CUDA_VISIBLE_DEVICES @kevinzwang (#3002)
- [FEAT]: sql tbl alias, and compount ident for joins @universalmind303 (#3066)
- [FEAT]: sql between @universalmind303 (#3062)
- [FEAT]: Interval dtype @universalmind303 (#3018)
- [FEAT] Enable to_json_string for physical plan @colin-ho (#3023)
- [FEAT]: Daft support for Azure storage for Unity Catalog
daft.read_deltalake
@anilmenon14 (#3025) - [FEAT] Iceberg MOR for streaming parquet @colin-ho (#2975)
- [FEAT] Include file paths as column from read_parquet/csv/json @colin-ho (#2953)
🚀 Performance Improvements
- [PERF] Remove stateful actor child materialization limit @kevinzwang (#3099)
👾 Bug Fixes
- [BUG] Bump up max_header_size @raunakab (#3068)
- [BUG] Autodetect AWS region during deltalake scan @kevinzwang (#3104)
- [BUG] Add over clause in read_sql percentile reads @colin-ho (#3094)
- [BUG] Disable Linux SSL CERT override @samster25 (#3098)
- [BUG] Fix into_partitions to use a more naive approach without materialization @jaychia (#3080)
- [BUG] Fix actor pool initialization in ray client mode @kevinzwang (#3028)
- [BUG]: joins with duplicate column names and qualified table expansion @universalmind303 (#3074)
- [BUG]: sql functions case sensitivit @universalmind303 (#3063)
- [BUG] Fix write_deltalake add action file path prefix @kevinzwang (#3053)
- [BUG] Fix intersection checking when unioning schemas @desmondcheongzx (#3039)
- [BUG] Sampling without replacement not working @colin-ho (#3035)
🧰 Maintenance
- [CHORE]: replace the
.venv
value with global variableVENV
@mohamedrezk122 (#3084) - [CHORE] Enable lancedb reads for native executor @colin-ho (#2925)
- [CHORE] Auto attach LLDB debugger to python #2940 @sagiahrac (#3020)
- [CHORE] Rename config.yaml to config.yml @samster25 (#3045)
- [CHORE] add config.yaml for issues @samster25 (#3044)
- [CHORE] validation on dropdown @samster25 (#3043)
- [CHORE] preserve quotes in yaml @samster25 (#3042)
- [CHORE] Checkbox for contribution @samster25 (#3041)
- [CHORE] update feature request @samster25 (#3040)
v0.3.8
Changes
👾 Bug Fixes
📖 Documentation
- [DOCUMENTATION] add value counts to rst @andrewgazelka (#3032)
🧰 Maintenance
- [DOCUMENTATION] add value counts to rst @andrewgazelka (#3032)
v0.3.7
v0.3.6
Changes
✨ New Features
- [FEAT] Implement standard deviation @raunakab (#3005)
- [FEAT] Add time travel to read_deltalake @kevinzwang (#3022)
- [FEAT] agg_list support for list and struct types @kevinzwang (#3019)
- [FEAT] Cast SparseTensor and FixedShapeSparseTensor to Python @sagiahrac (#3010)
- [FEAT] add
list.value_counts()
@andrewgazelka (#2902) - [FEAT] Infer timedelta literal as duration @colin-ho (#3011)
- [DOCS] Naming consistency of
length
functions @vicky1999 (#2942)
👾 Bug Fixes
- [BUG] Pass parquet2 io errors correctly into arrow2 @desmondcheongzx (#3012)
- [BUG] Fix actor pool project splitting when column is not renamed @kevinzwang (#2998)
- [BUG] Add resources to Ray stateful UDF actor @kevinzwang (#2987)
- [BUG] Fix join errors with same key name joins (resolves #2649) @anmolsingh20 (#2877)
- [BUG]: error messages for add @universalmind303 (#2990)
📖 Documentation
- [FEAT] Implement standard deviation @raunakab (#3005)
- [DOC] fix link in doc @amitschang (#2944)
- [DOCS] Update readme to use python syntax highlighting @jaychia (#3006)
- [DOCS] Naming consistency of
length
functions @vicky1999 (#2942) - [DOCS] Update readme to correctly reflect new messaging @jaychia (#3001)
🧰 Maintenance
- [CHORE] add/fix many clippy lints @andrewgazelka (#2978)
v0.3.5
Changes
✨ New Features
- [FEAT]: sql
read_deltalake
function @universalmind303 (#2974) - [FEAT]: SQL add hash and minhash @universalmind303 (#2948)
- [FEAT] Enable init args for stateful UDFs @kevinzwang (#2956)
👾 Bug Fixes
- [BUG]: add count_matches and fix a bunch of str functions @universalmind303 (#2946)
- [BUG] Writes from empty partitions should return empty micropartitions with non-null schema @colin-ho (#2952)
- [CHORE] Enable test_creation and test_parquet for native executor @colin-ho (#2672)
- [BUG] improve error reporting for multistatement sql @amitschang (#2916)
- [BUG]: sql nested and wildcard @universalmind303 (#2937)
- [BUG] Enable groupby with alias for native executor @colin-ho (#2917)
- [BUG] Use dashes for machete dependency ignores @colin-ho (#2919)
📖 Documentation
- [DOCS] Fix docs to add SQL capabilities @jaychia (#2931)
- [DOCS] update arch png @samster25 (#2970)
- [DOCS] Add docs on to_arrow and as_arrow @samster25 (#2965)
- [DOCS]: add a helper function to list all sql functions @universalmind303 (#2943)
- [CHORE] Additional fixes for nightly tests @kevinzwang (#2936)
- [CHORE] Fix issues from nightly tests @kevinzwang (#2926)
🧰 Maintenance
- [CHORE] ignore 45e2944 @andrewgazelka (#2979)
- [CHORE] Enable test_creation and test_parquet for native executor @colin-ho (#2672)
- [CHORE] pin cargo machete to 0.7.0 @andrewgazelka (#2920)
- [CHORE] Refactor Binary Ops @samster25 (#2876)
- [CHORE] add pytest to vscode settings.json @andrewgazelka (#2930)
- [CHORE] Additional fixes for nightly tests @kevinzwang (#2936)
- [CHORE] update GH template name from md to yml @samster25 (#2934)
- [CHORE] update GH bug template @samster25 (#2932)
- [CHORE] Fix issues from nightly tests @kevinzwang (#2926)
- [CHORE] Enable sources to return empty tables @colin-ho (#2915)
v0.3.4
Changes
✨ New Features
- [FEAT]
agg_concat
doesn't work on strings @vicky1999 (#2847) - [FEAT] Add ability for RayRunner to run actor pool projects (beta feature) @jaychia (#2881)
- [FEAT]: [SQL] struct subscript and json_query @universalmind303 (#2891)
- [FEAT] UTF8 to binary coercion flag @raunakab (#2893)
- [FEAT] Delta Lake partitioned writing @kevinzwang (#2884)
- [FEAT]: add partitioning_* functions to sql @universalmind303 (#2869)
- [FEAT]: add sql support for "DATE <date>" and "DATETIME <datetime>" @universalmind303 (#2870)
- [FEAT] Add Sparse Tensor logical type @michaelvay (#2722)
- [FEAT] [SQL] Enable SQL query to run on callers scoped variables @amitschang (#2864)
- Revert "[FEAT]:
shuffle_join_default_partitions
param" @jaychia (#2873) - [FEAT] Iceberg partitioned writes @kevinzwang (#2842)
- [FEAT]: SQL temporal functions @universalmind303 (#2858)
- [FEAT]: sql list operations @universalmind303 (#2856)
- [FEAT]:
shuffle_join_default_partitions
param @universalmind303 (#2844) - [FEAT] Add left/right/anti/semi joins to native executor @colin-ho (#2743)
🚀 Performance Improvements
- [PERF] Lazily import heavy modules to speed up import times @desmondcheongzx (#2826)
👾 Bug Fixes
- [BUG] Fix display for decimal types @raunakab (#2909)
- [BUG] Fix partitioning SQL scans on empty tables @desmondcheongzx (#2885)
- [BUG] Fix concat expression typing @colin-ho (#2868)
🧰 Maintenance
- [CHORE] Classify throttle and internal errors as Retryable in Python @samster25 (#2914)
- [CHORE] auto-fix prefer
Self
over explicit type @andrewgazelka (#2908) - [CHORE]: bump sqlparser version @universalmind303 (#2886)
- [CHORE]: Move daft.sql.sql module to daft.sql @universalmind303 (#2907)
- [CHORE] ignore vendored crates for codecov @samster25 (#2895)
- [CHORE]: move
numeric
out of daft-dsl and intodaft-functions
@universalmind303 (#2857) - [CHORE] Update documentation for config variables @jaychia (#2874)
- [CHORE] Move codspeed interactive tests to local files @samster25 (#2872)
- [CHORE]: move list functions from daft-dsl to daft-functions @universalmind303 (#2854)
- [CHORE] Change TPC-H q4 and q22 answers to use new join types @kevinzwang (#2756)
- [CHORE] Add native executor to CI @colin-ho (#2855)
⬆️ Dependencies
- Bump astral-sh/setup-uv from 2 to 3 @dependabot (#2888)
- Bump isbang/compose-action from 2.0.0 to 2.0.2 @dependabot (#2887)
v0.3.3
Changes
✨ New Features
- [FEAT]: Dataframe.filter method @universalmind303 (#2853)
- [FEAT] Add
to_pylist
on DataFrame @vicky1999 (#2816) - [FEAT]: sql float operations @universalmind303 (#2834)
- [FEAT]: sql count(*) @universalmind303 (#2832)
- [FEAT] Delta lake allow unsafe rename for local writes @kevinzwang (#2824)
- [FEAT] Ellipsize glob scan paths @anmolsingh20 (#2809)
- [FEAT] [SQL] Add global agg support for SQL @amitschang (#2799)
- [FEAT] Adds str.length_bytes() function @thomasjpfan (#2775)
🚀 Performance Improvements
👾 Bug Fixes
- [BUG]: Sql groupby fix @universalmind303 (#2843)
- [BUG] Avoid reconstructing sql query in read_sql @colin-ho (#2818)
- [BUG] Perform cleanup of tasks and results when iterator is deleted @jaychia (#2812)
- [BUG] Propogate S3Config.num_tries to pyarrow S3 filesystem @jmurray-clarify (#2800)
📖 Documentation
- [FEAT]: Dataframe.filter method @universalmind303 (#2853)
- [FEAT] Add
to_pylist
on DataFrame @vicky1999 (#2816) - [FEAT] Delta lake allow unsafe rename for local writes @kevinzwang (#2824)
- [DOCS] Add docs to hash and hash to docs @kevinzwang (#2821)
- [DOCS] Trigger the workflow after PR Labeler runs @jaychia (#2823)
- [CHORE] Update netlify publishing @jaychia (#2814)
- [DOCS] Enable hosted docs preview @jaychia (#2803)
- [DOCS] Fix documentation errors @jaychia (#2811)
- [DOCS] Add grouping and aggregation docs @colin-ho (#2805)
- [DOCS] Casting matrix @colin-ho (#2801)
- [FEAT] Adds str.length_bytes() function @thomasjpfan (#2775)
🧰 Maintenance
- [CHORE] Add rustfmt config file and run formatter @raunakab (#2807)
- [CHORE] Concretize casting semantics for temporal + decimal types @colin-ho (#2798)
- [CHORE]: Move jq out of core @universalmind303 (#2828)
- [CHORE] Install Python before using uv @samster25 (#2840)
- [CHORE] Decouple Ray tensor types from main Daft logic @desmondcheongzx (#2829)
- [CHORE] Ensure compatibility with deltalake version v0.19 @kevinzwang (#2827)
- [CHORE] Update PyO3 and use their new Bound API @kevinzwang (#2793)
- [CHORE]: Move image kernel out of daft-core @universalmind303 (#2804)
- [CHORE] Cleanup display impls - follow-up PR @raunakab (#2820)
- [CHORE] Break daft-plan/daft-scheduler dependency on daft-io @jaychia (#2813)
- [CHORE] Remove enum imports daft core @raunakab (#2819)
- [CHORE] Add
derive_more
to get rid of manualDisplay
impls @raunakab (#2794) - [CHORE] Move out datatype and schema from daft-core @samster25 (#2806)
- [CHORE] Update netlify publishing @jaychia (#2814)
- [CHORE] Remove user-facing arguments for casting to Ray's tensor type @jaychia (#2802)
- [CHORE] Use treenode for tree traversal in logical optimizer rules @kevinzwang (#2797)
v0.3.2
Changes
✨ New Features
- [FEAT] Add runner logic in PyRunner for ActorPoolProject @jaychia (#2677)
- [FEAT]: sql image_encode and image_resize @universalmind303 (#2764)
- [FEAT] sql
image_decode
@universalmind303 (#2757) - [FEAT] Add an
approx_count_distinct
expression (using the HLL algorithm) @raunakab (#2718) - [FEAT] Add support for sum aggregation for decimal128 type @amitschang (#2755)
- [FEAT] expose more type info @chuanlei-coding (#2762)
- [FEAT] Adds SQL function modules @RCHowell (#2725)
- [FEAT] (ACTORS-3) Propagate feature flags from Planning Config through to logical optimizer @jaychia (#2674)
- [FEAT] Fix projection pushdowns in actor pool project @jaychia (#2680)
🚀 Performance Improvements
👾 Bug Fixes
- [BUG] Groupby with alias not working @colin-ho (#2790)
- [BUG] Fix parquet reads with limit across row groups @desmondcheongzx (#2751)
- [BUG] Fix ScanTask memory estimations when limits are provided @jaychia (#2735)
- [BUG] Enable Spawn Functions for IO and Compute Functions @samster25 (#2687)
- [BUG] Fix
set_execution_config
not settinghash_join_partition_size_leniency
@Vince7778 (#2759) - [BUG] Fix
count("*")
behavior @Vince7778 (#2733) - [BUG] Add marker prefixes to filter during reads @colin-ho (#2726)
- [BUG]: fsl to list with validity @universalmind303 (#2729)
- [BUG]: use recordbatch instead of table for
df.to_arrow_iter
@universalmind303 (#2724)
📖 Documentation
- [DOCS] Fix struct accessors in tutorial examples @jaychia (#1809)
- Fix huggingface.rst documentation @asmith26 (#2746)
- [DOCS]: Fix typos in UDF documentation @amitschang (#2728)
- [DOCS] Fix small typo in partitioning.rst @jaychia (#2721)
🧰 Maintenance
- [CHORE] Enable all targets for cargo check @samster25 (#2792)
- [CHORE] refactor daft-core with preclude @samster25 (#2782)
- [CHORE] Implement thiserror::Error for DaftError and arrow2::Error @raunakab (#2785)
- [CHORE] Rename vpartition -> micropartition @jaychia (#2781)
- [CHORE] Add check for stateful UDF outside of project @kevinzwang (#2771)
- [CHORE] Fix conditional compilation for UDFs @jaychia (#2761)
- [CHORE] Refactor local hash joins + pipeline connections @colin-ho (#2719)
- [CHORE]: remove this file @universalmind303 (#2752)
- [CHORE] Add .lldbinit for debugging @kevinzwang (#2750)
- [CHORE] early terminate read parquet bulk @samster25 (#2748)
- [CHORE] add large fake files for benchmarks (disabled) @samster25 (#2744)
- [CHORE] disables aqe tests in CI @samster25 (#2745)
- [CHORE] add benchmarks for interactive reads @samster25 (#2732)
v0.3.1
Changes
✨ New Features
- [FEAT] (ACTORS-2) Add optimization pass to split Project into ActorPoolProject @jaychia (#2627)
- [FEAT] Stream results from native executor into python @colin-ho (#2667)
- [FEAT]: huggingface integration @universalmind303 (#2701)
🚀 Performance Improvements
- [PERF] Fix excessive parquet metadata reading @Vince7778 (#2694)
👾 Bug Fixes
- [BUG] Use python logging level @colin-ho (#2705)
- [BUG] Add a with_execution/planning_config context manager and fix tests for splitting of parquet @jaychia (#2713)
- [BUG] Fix Resource Request Serialization and factor our Serialize Object as bincode @samster25 (#2707)
📖 Documentation
- [DOCS] Partitioning user guide and small doc fixes @jaychia (#2717)
- [FEAT] (ACTORS-2) Add optimization pass to split Project into ActorPoolProject @jaychia (#2627)
- [BUG] Add a with_execution/planning_config context manager and fix tests for splitting of parquet @jaychia (#2713)
- Update PreCommit Hooks @samster25 (#2715)
- [FEAT]: huggingface integration @universalmind303 (#2701)