Rewriting the partitioning · devin-petersohn/modin@36be7cb

Commit

Rewriting the partitioning

Adding some partitioning updates

Updating remote

Continuing backend rewrite progress

Adding idxmax/min

Adding head/tail/repr for data_manager

Fixing transpose

Updating remote with bugfixes for transpose/repr

Fixing a number of operations

Fix quantile

Adding more functionality, __getitem__

Fixing more tests

Fixing drop, passing tests

Add insert to new structure

Cleaning up unneeded imports

Updating remote

Fix minor bug

Updating remote

Add sort_index

Minor refactor of code. Cleaning some up

Continuing logic migration

Add some docs

Adding more docs

Add more method level documentation.

Adding documentation and cleaning up

Retructuring partitioning files for simplicity

Adding more docs, cleaning up docs

Fix performance bug, more cleanup

More cleanup/renaming

Adding factory skeleton

Adding from_pandas code path and update constructor

Removing debugging code

Cleaning up dead code

Added type checking and changed how variables were read in from kwargs (#1)

Removing IndexMetadata and code_gen

Adding preliminary apply method

Updated sample and eval to the new backend (#2)

* Added type checking and changed how variables were read in from kwargs

* Updated sample to new architecture

* Made test_sample more rigourous

* Removed 'default=' from kwargs.get's

* Updated eval to the new backend

* Added two more tests for eval

Finalizing apply and start agg

Fixing some broken stuff with apply, update remote

Starting dictionary apply and fillna

Fixed dictionary apply and fillna

Moves Inter DataFrame Operations Logic to Data Manager (#3)

* Moving multi dataframe operation logic to data_manager

* Remove Unused Functions from dataframe.py

* removed unnecessary isScalar arg

* minor code cleanup

* changed _operator_handler name

* removing hasattr from data_manager

* cleaning up dataframe.py code for add function

* changing name to _validate_other

* cleaned up kwargs parsing in data_manager function

* updated all inter df functions

* commenting out old helper functions in dataframe.py

* cleaned up unused code

* fixed type error for functions using map_across_axis

Updated info and memory_usage to new backend (#4)

* Added type checking and changed how variables were read in from kwargs

* Updated sample to new architecture

* Made test_sample more rigourous

* Removed 'default=' from kwargs.get's

* Updated eval to the new backend

* Added two more tests for eval

* Updated memory_usage to new backend

* Updated info and memory_usage to the new backend

* Updated info and memory_usage to be standalone tests and updated the tests

* Updated info to do only one pass

* Updated info to do everything in one run with DataFrame

* Update info to do everything in one run with Series

* Updated info to do everything in one run with DataFrame

* Updated to get everything working and moved appropriate parts to DataManager

Adding first where implementation

Adding sort_values and update implementations

Cleaning up dead code

Adding manual_shuffle abstraction

Starting merge

Add merge

Cleaning up

Add dtype (#6)

* Added type checking and changed how variables were read in from kwargs

* Updated sample to new architecture

* Made test_sample more rigourous

* Removed 'default=' from kwargs.get's

* Updated eval to the new backend

* Added two more tests for eval

* Updated memory_usage to new backend

* Updated info and memory_usage to the new backend

* Updated info and memory_usage to be standalone tests and updated the tests

* Updated info to do only one pass

* Updated info to do everything in one run with DataFrame

* Update info to do everything in one run with Series

* Updated info to do everything in one run with DataFrame

* Updated to get everything working and moved appropriate parts to DataManager

* Removed extraneous print statement

* Moved dtypes stuff to data manager

* Fixed calculating dtypes to only doing a full_reduce instead of map_full_axis

* Updated astype to new backend

* Updated astype to new backend

* Updated ftypes to new backend

* Added dtypes argument to map_partitions

* Fixing dtypes

* Cleaning up dtype and merge issues

Fix isin bug

Cleaning up

Cleaning up more unused code

Updated iterables and to_datetime to new backend and improved astype runtime (#7)

* Added type checking and changed how variables were read in from kwargs

* Updated sample to new architecture

* Made test_sample more rigourous

* Removed 'default=' from kwargs.get's

* Updated eval to the new backend

* Added two more tests for eval

* Updated memory_usage to new backend

* Updated info and memory_usage to the new backend

* Updated info and memory_usage to be standalone tests and updated the tests

* Updated info to do only one pass

* Updated info to do everything in one run with DataFrame

* Update info to do everything in one run with Series

* Updated info to do everything in one run with DataFrame

* Updated to get everything working and moved appropriate parts to DataManager

* Removed extraneous print statement

* Moved dtypes stuff to data manager

* Fixed calculating dtypes to only doing a full_reduce instead of map_full_axis

* Updated astype to new backend

* Updated astype to new backend

* Updated ftypes to new backend

* Added dtypes argument to map_partitions

* Updated astype and added dtypes option to _from_old_block_partitions in RayPandasDataManager

* Undid unnecessary change

* Updated iterables to new backend

* Updated to_datetime to new backend

* Reverted some changes for PR

* Replaced pd with pandas

* Made additional changes mentioned in (#7)

Cleaning up

Cleaning up imports

Fix minor bug from getting kwargs

Concat now working with new architecture (#9)

* concat now working with new architecture

* fixing functionality for pandas Series

* updated append_list_of_data_managers function for concat

* minor stylistic fix

* remove unused append_data_manager function

* fixed join

* removed axis arg from join function

read_csv changes and improvements in performance (#10)

* Test changes to io

* Update io changes

* Fix performance bug

* Debugging performance

* Debugging performance on large IO

* Making some performance tuning changes

* Cleaning up and adding performance improvements

* Cleaning up

* Addressing comments

* Addressing comments

Fix bug

Formatting

fix fillna bug

updated rdiv, rpow, rsub methods (#12)

* updated rdiv, rpow, rsub methods

* spelled dataframe wrong

Fixed eval and astype (#11)

* Updated to_datetime docstring

* Updated astype tests

* Commented out loc and iloc tests

* Updated eval

* removed empty space and uncommented test_loc and test_iloc

Passes test_mixed_dtype_dataframe and test_nan_dataframe (#15)

* Fixed describe and quantiles and cleaned up code

* Updated numeric functions and handles empty dataframes

* Fixed dtypes and ftypes

* Imported is_numeric_dtype from pandas

* Cleaned up print statements in test_dataframe.py

Cleaning up and enabling tests. Fix __repr__

Removing dead code

Fix where bugy

Fix append error checking

Fix read_csv args bug

Fix read_parquet

Groupby implementation

Adding groupby final fix

Adding docs

Fix for info (#16)

* Quick fix for info

* Removed extraneous print statement

* Restructured to use count and memory_usage instead

Minor optimization change

get_dummies implementation (#19)

* intial code for get_dummies

* Starting help on get_dummies

* Fix get_dummies

* Removing dead code

* bug fix for get_dummies

Rewrite loc (#20)

* Rewrite the rewrite

Finish implement loc/iloc

Remove debug lines, fix typo

Removing unused imports

* Removing dead code

* Changing naming of clone

* Formatting and removing dead code

* Moving imports for matching pandas

Loading branch information

devin-petersohn committed Sep 17, 2018

1 parent cccea0b commit 36be7cb

modin/__init__.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -24,5 +24,25 @@ def _execute_cmd_in_temp_env(cmd): @@
             return "Unknown"
+    def get_execution_engine():
+        # In the future, when there are multiple engines and different ways of
+        # backing the DataFrame, there will have to be some changed logic here to
+        # decide these things. In the meantime, we will use the currently supported
+        # execution engine + backing (Pandas + Ray).
+        return "Ray"
+    def get_partition_format():
+        # See note above about engine + backing.
+        return "Pandas"
     __git_revision__ = git_version()
     __version__ = "0.1.2"
+    __execution_engine__ = get_execution_engine()
+    __partition_format__ = get_partition_format()
+    # We don't want these used outside of this file.
+    del git_version
+    del get_execution_engine
+    del get_partition_format

modin/data_management/__init__.py

Empty file.

0 comments on commit `36be7cb`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `36be7cb`

Commit

There are no files selected for viewing

0 comments on commit 36be7cb

0 comments on commit `36be7cb`