-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement simple SQL example and write intro docs #20
Conversation
modin/sql/connection.py
Outdated
table.append(to_append, ignore_index=True) | ||
print(self._tables[split_query[2]]) | ||
else: | ||
print("ERROR") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise NotImplementedError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really great! Just a few points noted.
@@ -0,0 +1,5 @@ | |||
SQL on Ray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the code example to this page too?
from __future__ import division | ||
from __future__ import print_function | ||
|
||
from ..pandas import Series, DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the ray.init
call run if DataFrame is imported relatively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does run, Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks for confirming.
modin/sql/connection.py
Outdated
columns = Series(column_names) | ||
self._tables[split_query[2]] = DataFrame(columns=columns) | ||
|
||
elif " ".join(split_query[:2]) == "INSERT INTO": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split these into helper methods? This will make development on further functionality better in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure makes sense. Most likely this code will get thrown away in the future, but the skeleton might stay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
to_append = Series([eval(i) for i in values], index=table.columns) | ||
self._tables[split_query[2]] = \ | ||
table.append(to_append, ignore_index=True) | ||
print(self._tables[split_query[2]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to show that it's inserting. It's only for a demonstration and will be removed in the future.
* Rewrite the rewrite Finish implement loc/iloc Remove debug lines, fix typo Removing unused imports * Removing dead code * Changing naming of clone * Formatting and removing dead code * Moving imports for matching pandas
Adding some partitioning updates Updating remote Continuing backend rewrite progress Adding idxmax/min Adding head/tail/repr for data_manager Fixing transpose Updating remote with bugfixes for transpose/repr Fixing a number of operations Fix quantile Adding more functionality, __getitem__ Fixing more tests Fixing drop, passing tests Add insert to new structure Cleaning up unneeded imports Updating remote Fix minor bug Updating remote Add sort_index Minor refactor of code. Cleaning some up Continuing logic migration Add some docs Adding more docs Add more method level documentation. Adding documentation and cleaning up Retructuring partitioning files for simplicity Adding more docs, cleaning up docs Fix performance bug, more cleanup More cleanup/renaming Adding factory skeleton Adding from_pandas code path and update constructor Removing debugging code Cleaning up dead code Added type checking and changed how variables were read in from kwargs (modin-project#1) Removing IndexMetadata and code_gen Adding preliminary apply method Updated sample and eval to the new backend (modin-project#2) * Added type checking and changed how variables were read in from kwargs * Updated sample to new architecture * Made test_sample more rigourous * Removed 'default=' from kwargs.get's * Updated eval to the new backend * Added two more tests for eval Finalizing apply and start agg Fixing some broken stuff with apply, update remote Starting dictionary apply and fillna Fixed dictionary apply and fillna Moves Inter DataFrame Operations Logic to Data Manager (modin-project#3) * Moving multi dataframe operation logic to data_manager * Remove Unused Functions from dataframe.py * removed unnecessary isScalar arg * minor code cleanup * changed _operator_handler name * removing hasattr from data_manager * cleaning up dataframe.py code for add function * changing name to _validate_other * cleaned up kwargs parsing in data_manager function * updated all inter df functions * commenting out old helper functions in dataframe.py * cleaned up unused code * fixed type error for functions using map_across_axis Updated info and memory_usage to new backend (modin-project#4) * Added type checking and changed how variables were read in from kwargs * Updated sample to new architecture * Made test_sample more rigourous * Removed 'default=' from kwargs.get's * Updated eval to the new backend * Added two more tests for eval * Updated memory_usage to new backend * Updated info and memory_usage to the new backend * Updated info and memory_usage to be standalone tests and updated the tests * Updated info to do only one pass * Updated info to do everything in one run with DataFrame * Update info to do everything in one run with Series * Updated info to do everything in one run with DataFrame * Updated to get everything working and moved appropriate parts to DataManager Adding first where implementation Adding sort_values and update implementations Cleaning up dead code Adding manual_shuffle abstraction Starting merge Add merge Cleaning up Add dtype (modin-project#6) * Added type checking and changed how variables were read in from kwargs * Updated sample to new architecture * Made test_sample more rigourous * Removed 'default=' from kwargs.get's * Updated eval to the new backend * Added two more tests for eval * Updated memory_usage to new backend * Updated info and memory_usage to the new backend * Updated info and memory_usage to be standalone tests and updated the tests * Updated info to do only one pass * Updated info to do everything in one run with DataFrame * Update info to do everything in one run with Series * Updated info to do everything in one run with DataFrame * Updated to get everything working and moved appropriate parts to DataManager * Removed extraneous print statement * Moved dtypes stuff to data manager * Fixed calculating dtypes to only doing a full_reduce instead of map_full_axis * Updated astype to new backend * Updated astype to new backend * Updated ftypes to new backend * Added dtypes argument to map_partitions * Fixing dtypes * Cleaning up dtype and merge issues Fix isin bug Cleaning up Cleaning up more unused code Updated iterables and to_datetime to new backend and improved astype runtime (modin-project#7) * Added type checking and changed how variables were read in from kwargs * Updated sample to new architecture * Made test_sample more rigourous * Removed 'default=' from kwargs.get's * Updated eval to the new backend * Added two more tests for eval * Updated memory_usage to new backend * Updated info and memory_usage to the new backend * Updated info and memory_usage to be standalone tests and updated the tests * Updated info to do only one pass * Updated info to do everything in one run with DataFrame * Update info to do everything in one run with Series * Updated info to do everything in one run with DataFrame * Updated to get everything working and moved appropriate parts to DataManager * Removed extraneous print statement * Moved dtypes stuff to data manager * Fixed calculating dtypes to only doing a full_reduce instead of map_full_axis * Updated astype to new backend * Updated astype to new backend * Updated ftypes to new backend * Added dtypes argument to map_partitions * Updated astype and added dtypes option to _from_old_block_partitions in RayPandasDataManager * Undid unnecessary change * Updated iterables to new backend * Updated to_datetime to new backend * Reverted some changes for PR * Replaced pd with pandas * Made additional changes mentioned in (modin-project#7) Cleaning up Cleaning up imports Fix minor bug from getting kwargs Concat now working with new architecture (modin-project#9) * concat now working with new architecture * fixing functionality for pandas Series * updated append_list_of_data_managers function for concat * minor stylistic fix * remove unused append_data_manager function * fixed join * removed axis arg from join function read_csv changes and improvements in performance (modin-project#10) * Test changes to io * Update io changes * Fix performance bug * Debugging performance * Debugging performance on large IO * Making some performance tuning changes * Cleaning up and adding performance improvements * Cleaning up * Addressing comments * Addressing comments Fix bug Formatting fix fillna bug updated rdiv, rpow, rsub methods (modin-project#12) * updated rdiv, rpow, rsub methods * spelled dataframe wrong Fixed eval and astype (modin-project#11) * Updated to_datetime docstring * Updated astype tests * Commented out loc and iloc tests * Updated eval * removed empty space and uncommented test_loc and test_iloc Passes test_mixed_dtype_dataframe and test_nan_dataframe (modin-project#15) * Fixed describe and quantiles and cleaned up code * Updated numeric functions and handles empty dataframes * Fixed dtypes and ftypes * Imported is_numeric_dtype from pandas * Cleaned up print statements in test_dataframe.py Cleaning up and enabling tests. Fix __repr__ Removing dead code Fix where bugy Fix append error checking Fix read_csv args bug Fix read_parquet Groupby implementation Adding groupby final fix Adding docs Fix for info (modin-project#16) * Quick fix for info * Removed extraneous print statement * Restructured to use count and memory_usage instead Minor optimization change get_dummies implementation (modin-project#19) * intial code for get_dummies * Starting help on get_dummies * Fix get_dummies * Removing dead code * bug fix for get_dummies Rewrite loc (modin-project#20) * Rewrite the rewrite Finish implement loc/iloc Remove debug lines, fix typo Removing unused imports * Removing dead code * Changing naming of clone * Formatting and removing dead code * Moving imports for matching pandas
Adding some partitioning updates Updating remote Continuing backend rewrite progress Adding idxmax/min Adding head/tail/repr for data_manager Fixing transpose Updating remote with bugfixes for transpose/repr Fixing a number of operations Fix quantile Adding more functionality, __getitem__ Fixing more tests Fixing drop, passing tests Add insert to new structure Cleaning up unneeded imports Updating remote Fix minor bug Updating remote Add sort_index Minor refactor of code. Cleaning some up Continuing logic migration Add some docs Adding more docs Add more method level documentation. Adding documentation and cleaning up Retructuring partitioning files for simplicity Adding more docs, cleaning up docs Fix performance bug, more cleanup More cleanup/renaming Adding factory skeleton Adding from_pandas code path and update constructor Removing debugging code Cleaning up dead code Added type checking and changed how variables were read in from kwargs (modin-project#1) Removing IndexMetadata and code_gen Adding preliminary apply method Updated sample and eval to the new backend (modin-project#2) * Added type checking and changed how variables were read in from kwargs * Updated sample to new architecture * Made test_sample more rigourous * Removed 'default=' from kwargs.get's * Updated eval to the new backend * Added two more tests for eval Finalizing apply and start agg Fixing some broken stuff with apply, update remote Starting dictionary apply and fillna Fixed dictionary apply and fillna Moves Inter DataFrame Operations Logic to Data Manager (modin-project#3) * Moving multi dataframe operation logic to data_manager * Remove Unused Functions from dataframe.py * removed unnecessary isScalar arg * minor code cleanup * changed _operator_handler name * removing hasattr from data_manager * cleaning up dataframe.py code for add function * changing name to _validate_other * cleaned up kwargs parsing in data_manager function * updated all inter df functions * commenting out old helper functions in dataframe.py * cleaned up unused code * fixed type error for functions using map_across_axis Updated info and memory_usage to new backend (modin-project#4) * Added type checking and changed how variables were read in from kwargs * Updated sample to new architecture * Made test_sample more rigourous * Removed 'default=' from kwargs.get's * Updated eval to the new backend * Added two more tests for eval * Updated memory_usage to new backend * Updated info and memory_usage to the new backend * Updated info and memory_usage to be standalone tests and updated the tests * Updated info to do only one pass * Updated info to do everything in one run with DataFrame * Update info to do everything in one run with Series * Updated info to do everything in one run with DataFrame * Updated to get everything working and moved appropriate parts to DataManager Adding first where implementation Adding sort_values and update implementations Cleaning up dead code Adding manual_shuffle abstraction Starting merge Add merge Cleaning up Add dtype (modin-project#6) * Added type checking and changed how variables were read in from kwargs * Updated sample to new architecture * Made test_sample more rigourous * Removed 'default=' from kwargs.get's * Updated eval to the new backend * Added two more tests for eval * Updated memory_usage to new backend * Updated info and memory_usage to the new backend * Updated info and memory_usage to be standalone tests and updated the tests * Updated info to do only one pass * Updated info to do everything in one run with DataFrame * Update info to do everything in one run with Series * Updated info to do everything in one run with DataFrame * Updated to get everything working and moved appropriate parts to DataManager * Removed extraneous print statement * Moved dtypes stuff to data manager * Fixed calculating dtypes to only doing a full_reduce instead of map_full_axis * Updated astype to new backend * Updated astype to new backend * Updated ftypes to new backend * Added dtypes argument to map_partitions * Fixing dtypes * Cleaning up dtype and merge issues Fix isin bug Cleaning up Cleaning up more unused code Updated iterables and to_datetime to new backend and improved astype runtime (modin-project#7) * Added type checking and changed how variables were read in from kwargs * Updated sample to new architecture * Made test_sample more rigourous * Removed 'default=' from kwargs.get's * Updated eval to the new backend * Added two more tests for eval * Updated memory_usage to new backend * Updated info and memory_usage to the new backend * Updated info and memory_usage to be standalone tests and updated the tests * Updated info to do only one pass * Updated info to do everything in one run with DataFrame * Update info to do everything in one run with Series * Updated info to do everything in one run with DataFrame * Updated to get everything working and moved appropriate parts to DataManager * Removed extraneous print statement * Moved dtypes stuff to data manager * Fixed calculating dtypes to only doing a full_reduce instead of map_full_axis * Updated astype to new backend * Updated astype to new backend * Updated ftypes to new backend * Added dtypes argument to map_partitions * Updated astype and added dtypes option to _from_old_block_partitions in RayPandasDataManager * Undid unnecessary change * Updated iterables to new backend * Updated to_datetime to new backend * Reverted some changes for PR * Replaced pd with pandas * Made additional changes mentioned in (modin-project#7) Cleaning up Cleaning up imports Fix minor bug from getting kwargs Concat now working with new architecture (modin-project#9) * concat now working with new architecture * fixing functionality for pandas Series * updated append_list_of_data_managers function for concat * minor stylistic fix * remove unused append_data_manager function * fixed join * removed axis arg from join function read_csv changes and improvements in performance (modin-project#10) * Test changes to io * Update io changes * Fix performance bug * Debugging performance * Debugging performance on large IO * Making some performance tuning changes * Cleaning up and adding performance improvements * Cleaning up * Addressing comments * Addressing comments Fix bug Formatting fix fillna bug updated rdiv, rpow, rsub methods (modin-project#12) * updated rdiv, rpow, rsub methods * spelled dataframe wrong Fixed eval and astype (modin-project#11) * Updated to_datetime docstring * Updated astype tests * Commented out loc and iloc tests * Updated eval * removed empty space and uncommented test_loc and test_iloc Passes test_mixed_dtype_dataframe and test_nan_dataframe (modin-project#15) * Fixed describe and quantiles and cleaned up code * Updated numeric functions and handles empty dataframes * Fixed dtypes and ftypes * Imported is_numeric_dtype from pandas * Cleaned up print statements in test_dataframe.py Cleaning up and enabling tests. Fix __repr__ Removing dead code Fix where bugy Fix append error checking Fix read_csv args bug Fix read_parquet Groupby implementation Adding groupby final fix Adding docs Fix for info (modin-project#16) * Quick fix for info * Removed extraneous print statement * Restructured to use count and memory_usage instead Minor optimization change get_dummies implementation (modin-project#19) * intial code for get_dummies * Starting help on get_dummies * Fix get_dummies * Removing dead code * bug fix for get_dummies Rewrite loc (modin-project#20) * Rewrite the rewrite Finish implement loc/iloc Remove debug lines, fix typo Removing unused imports * Removing dead code * Changing naming of clone * Formatting and removing dead code * Moving imports for matching pandas
* Rewriting the partitioning Adding some partitioning updates Updating remote Continuing backend rewrite progress Adding idxmax/min Adding head/tail/repr for data_manager Fixing transpose Updating remote with bugfixes for transpose/repr Fixing a number of operations Fix quantile Adding more functionality, __getitem__ Fixing more tests Fixing drop, passing tests Add insert to new structure Cleaning up unneeded imports Updating remote Fix minor bug Updating remote Add sort_index Minor refactor of code. Cleaning some up Continuing logic migration Add some docs Adding more docs Add more method level documentation. Adding documentation and cleaning up Retructuring partitioning files for simplicity Adding more docs, cleaning up docs Fix performance bug, more cleanup More cleanup/renaming Adding factory skeleton Adding from_pandas code path and update constructor Removing debugging code Cleaning up dead code Added type checking and changed how variables were read in from kwargs (#1) Removing IndexMetadata and code_gen Adding preliminary apply method Updated sample and eval to the new backend (#2) * Added type checking and changed how variables were read in from kwargs * Updated sample to new architecture * Made test_sample more rigourous * Removed 'default=' from kwargs.get's * Updated eval to the new backend * Added two more tests for eval Finalizing apply and start agg Fixing some broken stuff with apply, update remote Starting dictionary apply and fillna Fixed dictionary apply and fillna Moves Inter DataFrame Operations Logic to Data Manager (#3) * Moving multi dataframe operation logic to data_manager * Remove Unused Functions from dataframe.py * removed unnecessary isScalar arg * minor code cleanup * changed _operator_handler name * removing hasattr from data_manager * cleaning up dataframe.py code for add function * changing name to _validate_other * cleaned up kwargs parsing in data_manager function * updated all inter df functions * commenting out old helper functions in dataframe.py * cleaned up unused code * fixed type error for functions using map_across_axis Updated info and memory_usage to new backend (#4) * Added type checking and changed how variables were read in from kwargs * Updated sample to new architecture * Made test_sample more rigourous * Removed 'default=' from kwargs.get's * Updated eval to the new backend * Added two more tests for eval * Updated memory_usage to new backend * Updated info and memory_usage to the new backend * Updated info and memory_usage to be standalone tests and updated the tests * Updated info to do only one pass * Updated info to do everything in one run with DataFrame * Update info to do everything in one run with Series * Updated info to do everything in one run with DataFrame * Updated to get everything working and moved appropriate parts to DataManager Adding first where implementation Adding sort_values and update implementations Cleaning up dead code Adding manual_shuffle abstraction Starting merge Add merge Cleaning up Add dtype (#6) * Added type checking and changed how variables were read in from kwargs * Updated sample to new architecture * Made test_sample more rigourous * Removed 'default=' from kwargs.get's * Updated eval to the new backend * Added two more tests for eval * Updated memory_usage to new backend * Updated info and memory_usage to the new backend * Updated info and memory_usage to be standalone tests and updated the tests * Updated info to do only one pass * Updated info to do everything in one run with DataFrame * Update info to do everything in one run with Series * Updated info to do everything in one run with DataFrame * Updated to get everything working and moved appropriate parts to DataManager * Removed extraneous print statement * Moved dtypes stuff to data manager * Fixed calculating dtypes to only doing a full_reduce instead of map_full_axis * Updated astype to new backend * Updated astype to new backend * Updated ftypes to new backend * Added dtypes argument to map_partitions * Fixing dtypes * Cleaning up dtype and merge issues Fix isin bug Cleaning up Cleaning up more unused code Updated iterables and to_datetime to new backend and improved astype runtime (#7) * Added type checking and changed how variables were read in from kwargs * Updated sample to new architecture * Made test_sample more rigourous * Removed 'default=' from kwargs.get's * Updated eval to the new backend * Added two more tests for eval * Updated memory_usage to new backend * Updated info and memory_usage to the new backend * Updated info and memory_usage to be standalone tests and updated the tests * Updated info to do only one pass * Updated info to do everything in one run with DataFrame * Update info to do everything in one run with Series * Updated info to do everything in one run with DataFrame * Updated to get everything working and moved appropriate parts to DataManager * Removed extraneous print statement * Moved dtypes stuff to data manager * Fixed calculating dtypes to only doing a full_reduce instead of map_full_axis * Updated astype to new backend * Updated astype to new backend * Updated ftypes to new backend * Added dtypes argument to map_partitions * Updated astype and added dtypes option to _from_old_block_partitions in RayPandasDataManager * Undid unnecessary change * Updated iterables to new backend * Updated to_datetime to new backend * Reverted some changes for PR * Replaced pd with pandas * Made additional changes mentioned in (#7) Cleaning up Cleaning up imports Fix minor bug from getting kwargs Concat now working with new architecture (#9) * concat now working with new architecture * fixing functionality for pandas Series * updated append_list_of_data_managers function for concat * minor stylistic fix * remove unused append_data_manager function * fixed join * removed axis arg from join function read_csv changes and improvements in performance (#10) * Test changes to io * Update io changes * Fix performance bug * Debugging performance * Debugging performance on large IO * Making some performance tuning changes * Cleaning up and adding performance improvements * Cleaning up * Addressing comments * Addressing comments Fix bug Formatting fix fillna bug updated rdiv, rpow, rsub methods (#12) * updated rdiv, rpow, rsub methods * spelled dataframe wrong Fixed eval and astype (#11) * Updated to_datetime docstring * Updated astype tests * Commented out loc and iloc tests * Updated eval * removed empty space and uncommented test_loc and test_iloc Passes test_mixed_dtype_dataframe and test_nan_dataframe (#15) * Fixed describe and quantiles and cleaned up code * Updated numeric functions and handles empty dataframes * Fixed dtypes and ftypes * Imported is_numeric_dtype from pandas * Cleaned up print statements in test_dataframe.py Cleaning up and enabling tests. Fix __repr__ Removing dead code Fix where bugy Fix append error checking Fix read_csv args bug Fix read_parquet Groupby implementation Adding groupby final fix Adding docs Fix for info (#16) * Quick fix for info * Removed extraneous print statement * Restructured to use count and memory_usage instead Minor optimization change get_dummies implementation (#19) * intial code for get_dummies * Starting help on get_dummies * Fix get_dummies * Removing dead code * bug fix for get_dummies Rewrite loc (#20) * Rewrite the rewrite Finish implement loc/iloc Remove debug lines, fix typo Removing unused imports * Removing dead code * Changing naming of clone * Formatting and removing dead code * Moving imports for matching pandas * linting with yapf * Fix encoding (#21) * Fixing num_threads * Cleaned up code, added documentation, and fixed `all`, `quantiles`, and other numeric functions (#18) * Cleaned up code and added documentation * Worked on documentation * Added periods. * Changed all to follow the documentation * Added more method level documentation and fixed quantile with list input * Updated quantiles * Fixed dataframe to manager and reducesd descriptions to one line in data_manager.py documentation' * Fixed quantiles for sure this time * Changed numeric functions to be cleaner and work properly * Resolved comments * Fix travis dependency, add strip-hints for py2 checker * Fix mac install miniconda issue * Travis being flaky on mac * Add space to bash commands * Rework travis script * Fix pip flags * Remove -q flag * Fix python2 incompatibility * run strip-hints at py2.7 * [Travis] Fix path * [Travis] Reorder script orders * [Travis] Install py3.6 for lint * [Travis] Fix py2 import saferepr * yapf formatting * Method level documentation updates (#23) * updated documentation * formatting updates * added some docs for data_manager.py (#25) * added docs for data_manager.py starting line1250 * added and revised some func docs * lint formatting * Changing flake8 test to match yapf formatting * Resolve comments * Resolve comments * Fixing floating point error * Fix partitioning issue * lint * Addressing comments * Fix kwargs in remote task * Creating a minimum possible partition number * Fixing python2 compat * Revert test code * Turn off pytest warnings * [Travis] Remove verbosity in strip-type-hint * Fixing skew bug * Fixing agg test * Fix partitioning bug * Fix bugs * Fix test_max in axis 1 for python2 * Fix lint * Fix lint * Fix lint * Addressing comments * Final formatting * Fixing refactor
basic support for binary op (no join)
…) [upstream] not necessary for us, but better for upstream modin
…) [upstream] not necessary for us, but better for upstream modin
No description provided.