-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated module for handling large datsets #79
Conversation
tried chunking the dataset but the it was getting complex therefore used dask dataframe for better memory efficiency for loading and handling large datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
give steps for testing
Actually i added this function in the original module , but since then significant changes have been made such as methods like .analyse which got inlcuded when i pulled origin before commiting , after changes were made to repo i think the module itself is not working , if you want can send you test.py to test my method , because currently the module itself is not working for me |
yes @DarshAgrawal14, I can see merge conflicts. please do a git pull and have your code changes there |
@ombhojane i have resolved the issue , the PR is ready to merge |
@ombhojane , If you require any changes please let me know.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for contributing!
Fixed Issue : #2
I initially started with chunking approach divided the dataset into chunks but due creation of chunks the class was becoming complex and slow
Therefore i used Dask.dataframes instead of pandas this increases memory efficiency and can handle any dataset without hindering with other functions
Darsh Agrawal , GSSoC 24 extd contributor