Releases · Eventual-Inc/Daft

27 Nov 21:12

github-actions

v0.2.4

76ab4ca

v0.2.4

Changes

✨ New Features

[FEAT] show number of truncated columns @samster25 (#1673)
[FEAT] add retries to s3 credential provider timeouts @samster25 (#1663)
[FEAT] Dynamic Responsive Printing of Tables, Schema and Series @samster25 (#1662)
[FEAT] Print the results of a df.show() to stdout if running in non-interactive mode @jaychia (#1655)
[FEAT] 1606 - Adding hour expression in date util @suriya-ganesh (#1637)
[FEAT] [CSV Reader] Bulk CSV reader + general CSV reader refactor @clarkzinzow (#1614)
[FEAT] Use cached preview from df.collect() in df.show(). @clarkzinzow (#1651)

🚀 Performance Improvements

[PERF] Remove calls to remote_len_partition @jaychia (#1660)

👾 Bug Fixes

[BUG] Add an allowlist of DataTypes that ColumnRangeStatistics supports and validation of TableStatistics @jaychia (#1632)
[BUG] favor char indices instead of slicing to deal with unicode @samster25 (#1664)
[BUG] pass in pyarrow dtype manually into parquet read @samster25 (#1650)
[CHORE] Fixed bug in ray version @dioptre (#1649)

🧰 Maintenance

[CHORE] pin pandas for 3.8 @samster25 (#1661)
[CHORE] pin ray to 2.7.1 if less than 3.8 @samster25 (#1657)
[CHORE] enable refresh on tqdm total updates @samster25 (#1654)

⬆️ Dependencies

8 changes

Bump chrono-tz from 0.8.3 to 0.8.4 @dependabot (#1670)
Bump pytest from 7.4.1 to 7.4.3 @dependabot (#1644)
Bump pandas from 2.0.3 to 2.1.3 @dependabot (#1643)
Bump azure-storage-blob from 12.17.0 to 12.19.0 @dependabot (#1645)
Bump async-compression from 0.4.4 to 0.4.5 @dependabot (#1638)
Bump serde_json from 1.0.107 to 1.0.108 @dependabot (#1639)
Bump base64 from 0.21.4 to 0.21.5 @dependabot (#1640)
Bump dyn-clone from 1.0.14 to 1.0.16 @dependabot (#1642)

Contributors

dioptre, samster25, and 4 other contributors

Assets 2

20 Nov 19:29

github-actions

v0.2.3

c2d4c23

v0.2.3

Changes

✨ New Features

Enabling quote, comment and escape character @suriya-ganesh (#1582)
[FEAT] Iceberg Scan Operator @samster25 (#1561)
[FEAT] Enable Progress Bars for PyRunner and RayRunner @samster25 (#1609)

👾 Bug Fixes

[BUG] Fix CSV roundtrip for decimals (actually an f64->decimal casting bug) @jaychia (#1626)
[BUG] Filter out size-0 directory marker files during s3 globs @jaychia (#1629)
[BUG] raise error if non valid parquet file (less than parquet footer size) @samster25 (#1628)
[BUG] Fix parquet timestamp tz roundtrip inference @jaychia (#1625)
[BUG] Roundtrip tests for CSVs and Parquet @jaychia (#1616)
[BUG] Self-concat breaks with the RayRunner @jaychia (#1617)
[BUG] Add better handling for case where glob of parquet files returns empty @jaychia (#1615)
[BUG] enable fixed size binary ingest to daft binary @samster25 (#1612)
[BUG] Manually specify region in tutorial read_json @jaychia (#1608)
[BUG] remove f strings from logging @samster25 (#1611)

📖 Documentation

[BUG] Manually specify region in tutorial read_json @jaychia (#1608)

🧰 Maintenance

[CHORE] Fix style lints from #1582 @jaychia (#1635)
[CHORE] add ray client to deps @samster25 (#1631)
[CHORE] update fsspecs (s3, gcs, aldfs) in lockstep @samster25 (#1620)
[CHORE] update azure storage blobs to 0.17.0 @samster25 (#1622)
[CHORE] delete old rule runners @samster25 (#1619)
[CHORE] drop ray default dep to make room for Pydantic > 2.0 @samster25 (#1618)

⬆️ Dependencies

Bump moonrepo/setup-rust from 0 to 1 @dependabot (#1237)
Bump google-cloud-storage from 0.13.1 to 0.14.0 @dependabot (#1549)
Bump async-compat from 0.2.2 to 0.2.3 @dependabot (#1567)

Contributors

samster25, subygan, and 2 other contributors

Assets 2

0 Join discussion

14 Nov 19:47

github-actions

v0.2.2

6abc006

v0.2.2

Changes

[CHORE] Edit 'make-hooks' command to install pre-commit script @colin-ho (#1602)
[CHORE] Improve error messages when calling aggregation methods on dataframe without input columns @colin-ho (#1587)

✨ New Features

[FEAT] Add translation of IOConfig to PyArrow filesystem arguments @jaychia (#1592)
[FEAT] [Scan Operator] Refactor planning and execution code to use shared Pushdowns struct. @clarkzinzow (#1595)
[FEAT] [Scan Operator] Add ChunkSpec for specifying format-specific per-file row subset selection for ScanTasks. @clarkzinzow (#1590)
[FEAT] [Scan Operator] Integrate size_bytes with ScanOperators @clarkzinzow (#1586)
[FEAT] [Scan Operator] Add Python I/O support (+ JSON) to MicroPartition reads @clarkzinzow (#1578)
[FEAT][ScanOperator 1/3] Add MVP e2e ScanOperator integration. @clarkzinzow (#1559)

🚀 Performance Improvements

[PERF][REVERT] Reverts: use pyarrow table for pickling rather than ChunkedArray (#1488) @jaychia (#1605)
[PERF] Speed Up MicroPartition Ops when we know the result is empty @samster25 (#1604)

👾 Bug Fixes

[BUG] clean up ray scheduler threads after computing partial results @samster25 (#1597)
[BUG] Update requirements for typing_extensions @jaychia (#1593)
[BUG] Fix Deadlock with ScanOperators in to_physical_plan_scheduler and show iostats for glob and from_scan_task @samster25 (#1581)
[BUG] add allow threads for io pool operations @samster25 (#1580)

🧰 Maintenance

[CHORE] delete unused wheel tools @samster25 (#1603)
[CHORE] add IOStats to all micropartition ops @samster25 (#1584)
[CHORE] Use DAFT_MICROPARTITIONS as shared feature flag for data catalog support @jaychia (#1579)
[CHORE] Convert GlobScanOperator to perform streaming into result and take a list of glob paths @jaychia (#1577)

⬆️ Dependencies

Bump numpy from 1.25.2 to 1.26.2 @dependabot (#1596)

Contributors

samster25, clarkzinzow, and 3 other contributors

Assets 2

0 Join discussion

01 Nov 00:16

github-actions

v0.2.1

c8fe883

v0.2.1

Changes

[FEAT] Support disabling using doubled quotes to escape in CSV @ravern (#1544)
[DOCS]: fix typo in doc @amir-f (#1534)

✨ New Features

[FEAT] GlobScanOperator @jaychia (#1550)
[FEAT] [New Query Planner] [2/N] Push partition spec into physical plan, remove Coalesce logical op. @clarkzinzow (#1540)

👾 Bug Fixes

[BUG] Fix reads of empty parquet files @jaychia (#1555)
[BUG] Bump Parquet reader max_page_size to 256MB @jaychia (#1553)
[BUG] add sort after running passes @samster25 (#1545)
[BUG] Fix credentials issues in colab/CI @jaychia (#1539)

📖 Documentation

[BUG] Fix credentials issues in colab/CI @jaychia (#1539)

🧰 Maintenance

[CHORE] Fix bad merge conflict in GlobScanOperator wrt CSV schema inference @jaychia (#1556)
[CHORE] Revert "Bump pandas from 2.0.3 to 2.1.2" @jaychia (#1554)
[CHORE] [New Query Planner] [1/N] Remove Python query planner. @clarkzinzow (#1538)
[CHORE] changes to partition field and field creation @samster25 (#1537)
[CHORE] Move code from daft-csv to daft-decoding @jaychia (#1533)

⬆️ Dependencies

6 changes

Bump pandas from 2.0.3 to 2.1.2 @dependabot (#1542)
Bump tempfile from 3.8.0 to 3.8.1 @dependabot (#1548)
Bump opencv-python from 4.8.0.76 to 4.8.1.78 @dependabot (#1546)
Bump aws-actions/configure-aws-credentials from 3 to 4 @dependabot (#1384)
Bump async-trait from 0.1.71 to 0.1.74 @dependabot (#1496)
Bump serde from 1.0.188 to 1.0.190 @dependabot (#1541)

Contributors

amir-f, samster25, and 4 other contributors

Assets 2

26 Oct 20:09

github-actions

v0.2.0

f49275b

v0.2.0

Changes

✨ New Features

[FEAT] Anonymous Scan Operator @samster25 (#1526)
[FEAT] Micropartition integration and tests @jaychia (#1502)
[FEAT] Make Binary Type Comparable @samster25 (#1528)
[FEAT] implement series serde @samster25 (#1519)
[FEAT] Add streaming + parallel CSV reader, with decompression support. @clarkzinzow (#1501)
[FEAT] IOStats for Native Reader @samster25 (#1493)

🚀 Performance Improvements

[PERF] Add "eager mode" to limits and use in .show() @jaychia (#1498)
[PERF] Micropartition, lazy loading and Column Stats @samster25 (#1470)
[PERF] Use pyarrow table for pickling rather than ChunkedArray @samster25 (#1488)
[PERF] Use region from system and leverage cached credentials when making new clients @samster25 (#1490)
[PERF] Update default max_connections 64->8 because it is now per-io-thread @jaychia (#1485)
[PERF] Pass-through multithreaded_io flag in read_parquet @jaychia (#1484)

👾 Bug Fixes

[BUG] Fix timestamp timezone parsing bug in CSVs @jaychia (#1530)
[BUG] Re-raise exceptions in rayrunner @jaychia (#1522)
[BUG] [CSV Reader] Fix CSV parsing bugs around nulls and timestamps. @clarkzinzow (#1523)
[BUG] Fix handling of special characters in S3LikeSource @jaychia (#1495)
[BUG] Fix local globbing of current directory @jaychia (#1494)
[BUG] fix script to upload file 1 at a time @samster25 (#1492)
[CHORE] Add tests and fixes for Azure globbing @jaychia (#1482)

📖 Documentation

[FEAT] Micropartition integration and tests @jaychia (#1502)

🧰 Maintenance

[CHORE] Allow release-drafter to increment minor version @jaychia (#1532)
[CHORE] Soft deprecation of fsspec from user-facing APIs @jaychia (#1467)
[CHORE] bring up fixtures for iceberg @samster25 (#1527)
[CHORE] Skip IO integration tests if being run from dependabot @jaychia (#1521)
[CHORE] Better logging for physical plan @jaychia (#1499)
[CHORE] Refactor logging @jaychia (#1489)
[CHORE] Add Workflow to build artifacts and upload to S3 @samster25 (#1491)
[CHORE] Update default num_tries on S3Config to 25 @jaychia (#1487)
[CHORE] Add tests and fixes for Azure globbing @jaychia (#1482)

Contributors

samster25, clarkzinzow, and jaychia

Assets 2

0 Join discussion

10 Oct 01:01

github-actions

v0.1.20

439f2bd

v0.1.20

Changes

✨ New Features

[FEAT] Streaming CSV reads @xcharleslin (#1479)
[FEAT] [Native I/O] Add a native CSV reader. @clarkzinzow (#1475)

🚀 Performance Improvements

[PERF] Update number of cores on every iteration @jaychia (#1480)
[Hotfix] Change to streaming reader for CSV schema inference. @clarkzinzow (#1471)

👾 Bug Fixes

[BUG] Properly dispatch limited reads in new query planner @xcharleslin (#1476)
[BUG] Fixes globbing on windows by consolidating on posix-style paths @jaychia (#1472)

🧰 Maintenance

[CHORE] Create SECURITY.md @samster25 (#1481)

Contributors

samster25, xcharleslin, and 2 other contributors

Assets 2

06 Oct 22:42

github-actions

v0.1.19

bb74530

v0.1.19

Changes

✨ New Features

[FEAT] Native globbing for other backends @jaychia (#1460)
[FEAT] Native glob functionality @jaychia (#1450)
[FEAT] ls/list_dir for AzureBlobStorage @xcharleslin (#1408)
[FEAT] Add .str.split() API for splitting string columns. @clarkzinzow (#1409)
[FEAT] Add local native filesystem globbing. @clarkzinzow (#1449)
[FEAT] Native listing of http URLs @jaychia (#1405)

🚀 Performance Improvements

[PERF] Local filesystem parquet reader @samster25 (#1461)
[PERF] Native globbing early stopping @jaychia (#1452)

👾 Bug Fixes

[BUG] fix circ import with pythonpath is set @samster25 (#1474)
[BUG] Don't remove all handles and Only use handler for files in src/ @samster25 (#1473)

🧰 Maintenance

[FEAT] Native globbing for other backends @jaychia (#1460)
[CHORE] update s3 connection defaults @samster25 (#1451)

Contributors

samster25, xcharleslin, and 2 other contributors

Assets 2

26 Sep 01:17

github-actions

v0.1.18

3403c0c

v0.1.18

Changes

✨ New Features

[FEAT] Add support for windows in daft @samster25 (#1386)
[FEAT] Add debug logging to s3 native apis @samster25 (#1414)
[FEAT] enable path style for s3 custom endpoints by default @samster25 (#1410)
[FEAT] Native S3 Lister, support trailing slashes and fix panics when connection is dropped for tokio @samster25 (#1404)
[FEAT] Native Rust listing of GCS @jaychia (#1392)
[FEAT] [New Query Planner] Enable new query planner by default. @clarkzinzow (#1398)
[FEAT] Parameter to set num_parallel_tasks for bulk readers @samster25 (#1399)
[FEAT] Native S3 Client: allow disabling ssl verification or checking hostnames @samster25 (#1395)
[FEAT] Improved projection folding. @xcharleslin (#1374)
[FEAT] bulk parquet pyarrow reader @samster25 (#1396)
[FEAT] Native Recursive File Lister @samster25 (#1353)
[FEAT] Implement .dt.year/month/day for timestamp types @jaychia (#1385)
[FEAT] [New Query Planner] Add support for fsspec filesystems to new query planner. @clarkzinzow (#1357)
[FEAT] Common subexpression elimination in Projection construction @xcharleslin (#1347)

👾 Bug Fixes

[BUG] Fix num input partitions in coalesce. @clarkzinzow (#1442)
[BUG] Fix scheme bug in GCS anonymous mode @jaychia (#1443)
[BUG] Fix runner check at plan execution time for new query planner @clarkzinzow (#1435)
[BUG] [Docs] Allow source code discovery to fail silently for pyo3-defined classes when generating docs. @clarkzinzow (#1430)
[BUG] patch workspace version when building wheels @samster25 (#1418)
[BUG] Anaconda client don't upload src wheels @samster25 (#1415)
[BUG] Anaconda client needs wildcard for upload @samster25 (#1413)
[BUG] Fix gs listing to include 0 sized marker files @jaychia (#1412)
[BUG] force upload of anaconda nightly wheels @samster25 (#1411)
[BUG] add test cases for bulk minio reading @samster25 (#1402)
[BUG] Fixes to S3 Native Lister with correct Error propagation @samster25 (#1401)
[BUG] Fix public API decorator type annotations. @clarkzinzow (#1397)
[BUG] Fix partition spec bugs from old query planner @xcharleslin (#1372)

📖 Documentation

[BUG] [Docs] Allow source code discovery to fail silently for pyo3-defined classes when generating docs. @clarkzinzow (#1430)
[FEAT] Implement .dt.year/month/day for timestamp types @jaychia (#1385)

🧰 Maintenance

[CHORE] disable windows pytest after building @samster25 (#1420)
[CHORE] add caching for pip wheels @samster25 (#1419)
[CHORE] macos xl runners are 0.32/minute not hour... @samster25 (#1417)
[CHORE] Centralize pyo3 pickling around __reduce__ + bincode macro. @clarkzinzow (#1394)
[CHORE] larger macos runner for builds @samster25 (#1403)
[CHORE] Add stubs and improve comments for pyo3-exposed abstractions, + driveby type/bug fixes. @clarkzinzow (#1377)
[CHORE] add retries for broken link checker @samster25 (#1378)
[CHORE] pin azure-storage-blob due to breaking new version @samster25 (#1373)
[CHORE] [New Query Planner] Misc. user-facing error tweaks to improve UX. @clarkzinzow (#1358)

Contributors

samster25, xcharleslin, and 2 other contributors

Assets 2

12 Sep 06:39

github-actions

v0.1.17

601260b

v0.1.17

Changes

✨ New Features

[FEAT] Native Parquet Reader into pyarrow directly @samster25 (#1366)
[FEAT] Add configurable io thread pool size @samster25 (#1363)
[FEAT] Add flag to limit number of connections to S3 @samster25 (#1360)
[FEAT] export jemalloc arm64 flag inside container @samster25 (#1362)

🚀 Performance Improvements

[PERF] Used owned Stream in Parquet Page Iterator @samster25 (#1365)
[PERF] enable jemalloc with background threads @samster25 (#1361)
[PERF] Add microbenchmarks for takes @jaychia (#1350)
[PERF] Optimize filter on nested growables @jaychia (#1349)

👾 Bug Fixes

[BUG] Respect multithreaded_io flag when reading parquet @samster25 (#1359)
[BUG] Schema Display should use dtype Display instead of Debug @jaychia (#1355)
[BUG] propagate parquet io error instead of panicking @samster25 (#1352)

🧰 Maintenance

[CHORE] [New Query Planner] Add simple df.explain() option; change to fixed-point policy for rule batch @clarkzinzow (#1354)
[CHORE] Add status code to IO integration tests @jaychia (#1356)
[CHORE] Fix List/FixedSizeList DataType to hold a dtype instead of Field @jaychia (#1351)
[CHORE] Add Series::full_null/empty/from_arrow to reduce code duplication @jaychia (#1331)
[CHORE] Add a Growable factory method @jaychia (#1330)
[CHORE] Add new ListArray @jaychia (#1329)

⬆️ Dependencies

5 changes

Bump tokio from 1.29.1 to 1.32.0 @dependabot (#1371)
Bump tempfile from 3.7.1 to 3.8.0 @dependabot (#1285)
Bump pyo3 from 0.19.1 to 0.19.2 @dependabot (#1312)
Bump pytest from 7.4.0 to 7.4.1 @dependabot (#1339)
Bump actions/checkout from 3 to 4 @dependabot (#1337)

Contributors

samster25, clarkzinzow, and 2 other contributors

Assets 2

06 Sep 02:07

github-actions

v0.1.16

bdc4ba4

v0.1.16

Changes

✨ New Features

[FEAT] __repr__ for ResourceRequest @xcharleslin (#1343)
[FEAT] [New Query Planner] Refactor file globbing logic by exposing FileInfos to Python @clarkzinzow (#1307)
[FEAT] S3 Native List Impl for a directory @samster25 (#1324)
[FEAT] [New Query Planner] Add support for DropRepartition @clarkzinzow (#1302)
[FEAT] Add all projection optimization rules to new query planner. @xcharleslin (#1288)
[FEAT] [New Query Planner] Add support for PushDownLimit @clarkzinzow (#1300)

👾 Bug Fixes

[BUG] Fix Table.read_parquet behavior when it encounters arrow_schema @jaychia (#1336)
[BUG] [New Query Planner] Revert file info partition column names. @clarkzinzow (#1333)
[BUG] Fix fixed size list array FullNull implementation @jaychia (#1320)

🧰 Maintenance

[CHORE] install perl before maturin @samster25 (#1345)
[CHORE] Switch to openssl @samster25 (#1344)
[CHORE] [New Query Planner] pyo3-agnostic LogicalPlanBuilder, op constructor arg orderings @clarkzinzow (#1332)
[CHORE] factor io config into common code @samster25 (#1335)
[CHORE] [New Query Planner] Remove ExpressionsProjection from builder, move validation into Op::try_new() @clarkzinzow (#1327)
[CHORE] StructArray refactors @jaychia (#1326)
[CHORE] drop flag for non native compile for daft profiling @samster25 (#1323)
[CHORE] pin pyarrow to 12 for ray compat tests @samster25 (#1322)
[CHORE] Move FixedSizeListArray to array/fixed_size_list_array.rs @jaychia (#1319)
[CHORE] Add fix for list schema inference tests using PyArrow 13.0.0 @jaychia (#1318)
[CHORE] Implementations of FixedSizeListArray @jaychia (#1281)

⬆️ Dependencies

Bump ray[data,default] from 2.6.0 to 2.6.3 @dependabot (#1315)
Bump orjson from 3.9.4 to 3.9.5 @dependabot (#1316)
Bump aws-actions/configure-aws-credentials from 2 to 3 @dependabot (#1317)

Contributors

samster25, xcharleslin, and 3 other contributors

Assets 2

Releases: Eventual-Inc/Daft

v0.2.4

Changes

✨ New Features

🚀 Performance Improvements

👾 Bug Fixes

🧰 Maintenance

⬆️ Dependencies

Contributors

v0.2.3

Changes

✨ New Features

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

⬆️ Dependencies

Contributors

v0.2.2

Changes

✨ New Features

🚀 Performance Improvements

👾 Bug Fixes

🧰 Maintenance

⬆️ Dependencies

Contributors

v0.2.1

Changes

✨ New Features

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

⬆️ Dependencies

Contributors

v0.2.0

Changes

✨ New Features

🚀 Performance Improvements

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

Contributors

v0.1.20

Changes

✨ New Features

🚀 Performance Improvements

👾 Bug Fixes

🧰 Maintenance

Contributors

v0.1.19

Changes

✨ New Features

🚀 Performance Improvements

👾 Bug Fixes

🧰 Maintenance

Contributors

v0.1.18

Changes

✨ New Features

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

Contributors

v0.1.17

Changes

✨ New Features

🚀 Performance Improvements

👾 Bug Fixes

🧰 Maintenance

⬆️ Dependencies

Contributors

v0.1.16

Changes

✨ New Features

👾 Bug Fixes

🧰 Maintenance

⬆️ Dependencies

Contributors