- Replace default parquet engine, deprecate Fastparquet, start using as default pyarrow
- Remove "chunksize" from collection.py as it's not used by dask nor pyarrow.
- Solve issue #69
- Fix collection.write by passing overwrite parameter.
- rename metadata.json to pystore_metadata.json, to avoid conflicts with pyarrow
- Fixed deprecate 'in' operator to be compatible with pandas 1.2.0 onwards (PR #58)
- Add argument to append() to control duplicates (PR #57)
- Uses PYSTORE_PATH environment variable, if it exists, as the path when not calling store.set_path() (defaults to ~/pystore)
- Updated PyPi install script (lib is the same as 0.1.20)
- Fix: Resetting config._CLIENT to None
- Fixed: Exposed set/get_partition_size and set/get_clients
- Added support for dask.distributed via pystore.set_client(...)
- Added store.item(...) for accessing single collection item directly (pull request #44)
- Added store.set_partition_size(...) and store.get_partition_size(). Default is ~99MB.
- Updated PyPi install script (lib is the same as 0.1.16)
- Fixed npartition=None issues on .append()
- Fixed append issues
- Raising an error when trying to read invalid item
- Fixed path issued (removed unnecessary os.path.join calls)
- Auto-detection and handling of nano-second based data
- collection.reload_items defaults to False
- Default npartitions and chunksize are better optimized (~99MB/partition)
- collection.apply() repartitions the dataframe based on new data size (~99MB/partition)
- Option to specify the default engine for the store by specifying engine="fastparquet" or engine="pyarrow" (dafaults to fastparquet)
- Solving fastparquet/numba issues when using Dask >= 2.2.0 by importing numba in __init__.py
- Added reload_items (default True) to collection.write and collection.delete to explicitly re-read the collection's items' directory
- Reversed list_snapshots() behaviour
- Added collection.threaded_write(...) method
- collection.items being updated using items.add() and an async/threaded directory read
- Switched from dtype_str to str(dtype) (Pandas 0.25+ compatibility)
- Implemented collection.items and collection.snapshots as @property to reduce initialization overhead
- collection.items and collection.snapshots are now of type set()
- Option to specify both npartitions and chunksize in collection.append()
- Fixed issues #13 and #15
- Added pystore.read-csv() to quickly load csv as dask dataframe, ready for storage
- Using os.path.expanduser("~") to determine user's home directory
- collection.write(...) accepts Dask dataframes
- Misc improvements
- Added support for Python 2.7
- Added support for Python 3.7
- Fixed support for nanosecond-level data
- epochdate defaults to True when storing ns data
- Switched to dtype_str instead of str(dtype)
- Infer datetime format when converting to Pandas
- Increased version to fix setup
- Bugfixes
- Switched path parsing to pathlib.Path to help with cross-platform compatibility
- Minor code refactoring
- Adding an index name when one is not available
- Added pystore.delete_store(NAME), pystore.delete_stores(), and pystore.get_path()
- Added Jupyter notebook example to Github repo
- Minor code refactoring
- Allowing _ and . in snapshot name
- Changed license to Apache License, Version 2.0
- Moduled seperated into files
- Code refactoring
- Added support for snapshots
- collection.list_items() supports querying based on metadata
- Some code refactoring
- Exposing more methods
- Path setting moved to pystore.set_path()
- Store.collection() auto-creates collection
- Updated readme to reflect changes
- Minor code refactoring
- Not converting datetimte to epoch by defaults (use epochdate=True to enable)
- Using "snappy" compression by default
- Metadata's "_updated" is now a YYYY-MM-DD HH:MM:SS.MS string
- Can pass columns and filters to Item object
- Faster append
- Store.path is now public
- Updated license version
- Switched readme/changelog files from .md to .rst.
- Initial release