Skip to content

Releases: Eventual-Inc/Daft

v0.2.4

27 Nov 21:12
76ab4ca
Compare
Choose a tag to compare

Changes

✨ New Features

  • [FEAT] show number of truncated columns @samster25 (#1673)
  • [FEAT] add retries to s3 credential provider timeouts @samster25 (#1663)
  • [FEAT] Dynamic Responsive Printing of Tables, Schema and Series @samster25 (#1662)
  • [FEAT] Print the results of a df.show() to stdout if running in non-interactive mode @jaychia (#1655)
  • [FEAT] 1606 - Adding hour expression in date util @suriya-ganesh (#1637)
  • [FEAT] [CSV Reader] Bulk CSV reader + general CSV reader refactor @clarkzinzow (#1614)
  • [FEAT] Use cached preview from df.collect() in df.show(). @clarkzinzow (#1651)

🚀 Performance Improvements

👾 Bug Fixes

  • [BUG] Add an allowlist of DataTypes that ColumnRangeStatistics supports and validation of TableStatistics @jaychia (#1632)
  • [BUG] favor char indices instead of slicing to deal with unicode @samster25 (#1664)
  • [BUG] pass in pyarrow dtype manually into parquet read @samster25 (#1650)
  • [CHORE] Fixed bug in ray version @dioptre (#1649)

🧰 Maintenance

⬆️ Dependencies

8 changes

v0.2.3

20 Nov 19:29
c2d4c23
Compare
Choose a tag to compare

Changes

✨ New Features

  • Enabling quote, comment and escape character @suriya-ganesh (#1582)
  • [FEAT] Iceberg Scan Operator @samster25 (#1561)
  • [FEAT] Enable Progress Bars for PyRunner and RayRunner @samster25 (#1609)

👾 Bug Fixes

  • [BUG] Fix CSV roundtrip for decimals (actually an f64->decimal casting bug) @jaychia (#1626)
  • [BUG] Filter out size-0 directory marker files during s3 globs @jaychia (#1629)
  • [BUG] raise error if non valid parquet file (less than parquet footer size) @samster25 (#1628)
  • [BUG] Fix parquet timestamp tz roundtrip inference @jaychia (#1625)
  • [BUG] Roundtrip tests for CSVs and Parquet @jaychia (#1616)
  • [BUG] Self-concat breaks with the RayRunner @jaychia (#1617)
  • [BUG] Add better handling for case where glob of parquet files returns empty @jaychia (#1615)
  • [BUG] enable fixed size binary ingest to daft binary @samster25 (#1612)
  • [BUG] Manually specify region in tutorial read_json @jaychia (#1608)
  • [BUG] remove f strings from logging @samster25 (#1611)

📖 Documentation

  • [BUG] Manually specify region in tutorial read_json @jaychia (#1608)

🧰 Maintenance

⬆️ Dependencies

v0.2.2

14 Nov 19:47
6abc006
Compare
Choose a tag to compare

Changes

  • [CHORE] Edit 'make-hooks' command to install pre-commit script @colin-ho (#1602)
  • [CHORE] Improve error messages when calling aggregation methods on dataframe without input columns @colin-ho (#1587)

✨ New Features

  • [FEAT] Add translation of IOConfig to PyArrow filesystem arguments @jaychia (#1592)
  • [FEAT] [Scan Operator] Refactor planning and execution code to use shared Pushdowns struct. @clarkzinzow (#1595)
  • [FEAT] [Scan Operator] Add ChunkSpec for specifying format-specific per-file row subset selection for ScanTasks. @clarkzinzow (#1590)
  • [FEAT] [Scan Operator] Integrate size_bytes with ScanOperators @clarkzinzow (#1586)
  • [FEAT] [Scan Operator] Add Python I/O support (+ JSON) to MicroPartition reads @clarkzinzow (#1578)
  • [FEAT][ScanOperator 1/3] Add MVP e2e ScanOperator integration. @clarkzinzow (#1559)

🚀 Performance Improvements

  • [PERF][REVERT] Reverts: use pyarrow table for pickling rather than ChunkedArray (#1488) @jaychia (#1605)
  • [PERF] Speed Up MicroPartition Ops when we know the result is empty @samster25 (#1604)

👾 Bug Fixes

  • [BUG] clean up ray scheduler threads after computing partial results @samster25 (#1597)
  • [BUG] Update requirements for typing_extensions @jaychia (#1593)
  • [BUG] Fix Deadlock with ScanOperators in to_physical_plan_scheduler and show iostats for glob and from_scan_task @samster25 (#1581)
  • [BUG] add allow threads for io pool operations @samster25 (#1580)

🧰 Maintenance

  • [CHORE] delete unused wheel tools @samster25 (#1603)
  • [CHORE] add IOStats to all micropartition ops @samster25 (#1584)
  • [CHORE] Use DAFT_MICROPARTITIONS as shared feature flag for data catalog support @jaychia (#1579)
  • [CHORE] Convert GlobScanOperator to perform streaming into result and take a list of glob paths @jaychia (#1577)

⬆️ Dependencies

v0.2.1

01 Nov 00:16
c8fe883
Compare
Choose a tag to compare

Changes

  • [FEAT] Support disabling using doubled quotes to escape in CSV @ravern (#1544)
  • [DOCS]: fix typo in doc @amir-f (#1534)

✨ New Features

  • [FEAT] GlobScanOperator @jaychia (#1550)
  • [FEAT] [New Query Planner] [2/N] Push partition spec into physical plan, remove Coalesce logical op. @clarkzinzow (#1540)

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

  • [CHORE] Fix bad merge conflict in GlobScanOperator wrt CSV schema inference @jaychia (#1556)
  • [CHORE] Revert "Bump pandas from 2.0.3 to 2.1.2" @jaychia (#1554)
  • [CHORE] [New Query Planner] [1/N] Remove Python query planner. @clarkzinzow (#1538)
  • [CHORE] changes to partition field and field creation @samster25 (#1537)
  • [CHORE] Move code from daft-csv to daft-decoding @jaychia (#1533)

⬆️ Dependencies

6 changes

v0.2.0

26 Oct 20:09
f49275b
Compare
Choose a tag to compare

Changes

✨ New Features

🚀 Performance Improvements

  • [PERF] Add "eager mode" to limits and use in .show() @jaychia (#1498)
  • [PERF] Micropartition, lazy loading and Column Stats @samster25 (#1470)
  • [PERF] Use pyarrow table for pickling rather than ChunkedArray @samster25 (#1488)
  • [PERF] Use region from system and leverage cached credentials when making new clients @samster25 (#1490)
  • [PERF] Update default max_connections 64->8 because it is now per-io-thread @jaychia (#1485)
  • [PERF] Pass-through multithreaded_io flag in read_parquet @jaychia (#1484)

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

v0.1.20

10 Oct 01:01
439f2bd
Compare
Choose a tag to compare

Changes

✨ New Features

🚀 Performance Improvements

  • [PERF] Update number of cores on every iteration @jaychia (#1480)
  • [Hotfix] Change to streaming reader for CSV schema inference. @clarkzinzow (#1471)

👾 Bug Fixes

  • [BUG] Properly dispatch limited reads in new query planner @xcharleslin (#1476)
  • [BUG] Fixes globbing on windows by consolidating on posix-style paths @jaychia (#1472)

🧰 Maintenance

v0.1.19

06 Oct 22:42
bb74530
Compare
Choose a tag to compare

Changes

✨ New Features

🚀 Performance Improvements

👾 Bug Fixes

  • [BUG] fix circ import with pythonpath is set @samster25 (#1474)
  • [BUG] Don't remove all handles and Only use handler for files in src/ @samster25 (#1473)

🧰 Maintenance

v0.1.18

26 Sep 01:17
3403c0c
Compare
Choose a tag to compare

Changes

✨ New Features

👾 Bug Fixes

📖 Documentation

  • [BUG] [Docs] Allow source code discovery to fail silently for pyo3-defined classes when generating docs. @clarkzinzow (#1430)
  • [FEAT] Implement .dt.year/month/day for timestamp types @jaychia (#1385)

🧰 Maintenance

v0.1.17

12 Sep 06:39
601260b
Compare
Choose a tag to compare

Changes

✨ New Features

🚀 Performance Improvements

👾 Bug Fixes

  • [BUG] Respect multithreaded_io flag when reading parquet @samster25 (#1359)
  • [BUG] Schema Display should use dtype Display instead of Debug @jaychia (#1355)
  • [BUG] propagate parquet io error instead of panicking @samster25 (#1352)

🧰 Maintenance

  • [CHORE] [New Query Planner] Add simple df.explain() option; change to fixed-point policy for rule batch @clarkzinzow (#1354)
  • [CHORE] Add status code to IO integration tests @jaychia (#1356)
  • [CHORE] Fix List/FixedSizeList DataType to hold a dtype instead of Field @jaychia (#1351)
  • [CHORE] Add Series::full_null/empty/from_arrow to reduce code duplication @jaychia (#1331)
  • [CHORE] Add a Growable factory method @jaychia (#1330)
  • [CHORE] Add new ListArray @jaychia (#1329)

⬆️ Dependencies

5 changes

v0.1.16

06 Sep 02:07
bdc4ba4
Compare
Choose a tag to compare

Changes

✨ New Features

👾 Bug Fixes

  • [BUG] Fix Table.read_parquet behavior when it encounters arrow_schema @jaychia (#1336)
  • [BUG] [New Query Planner] Revert file info partition column names. @clarkzinzow (#1333)
  • [BUG] Fix fixed size list array FullNull implementation @jaychia (#1320)

🧰 Maintenance

  • [CHORE] install perl before maturin @samster25 (#1345)
  • [CHORE] Switch to openssl @samster25 (#1344)
  • [CHORE] [New Query Planner] pyo3-agnostic LogicalPlanBuilder, op constructor arg orderings @clarkzinzow (#1332)
  • [CHORE] factor io config into common code @samster25 (#1335)
  • [CHORE] [New Query Planner] Remove ExpressionsProjection from builder, move validation into Op::try_new() @clarkzinzow (#1327)
  • [CHORE] StructArray refactors @jaychia (#1326)
  • [CHORE] drop flag for non native compile for daft profiling @samster25 (#1323)
  • [CHORE] pin pyarrow to 12 for ray compat tests @samster25 (#1322)
  • [CHORE] Move FixedSizeListArray to array/fixed_size_list_array.rs @jaychia (#1319)
  • [CHORE] Add fix for list schema inference tests using PyArrow 13.0.0 @jaychia (#1318)
  • [CHORE] Implementations of FixedSizeListArray @jaychia (#1281)

⬆️ Dependencies