Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial attempt at a fast init #726
initial attempt at a fast init #726
Changes from 4 commits
64a41f6
2dfc141
6a58ab2
7d5a596
45f3161
7046b86
873ee46
fd081ee
2b06af7
58c9822
22a6fa5
80cf8c5
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inplace=True
seldom gives you a speed-up with pandas.https://stackoverflow.com/a/60020384/2873952 .
reset_index
,fillna
,clip
are exceptions, but there is not a good list.In general, if the size of the data and the data-type stays the same, but needs to be copied to not operate in inplace, then there might be a sizable speed-up. In other situations more likely that it does not change anything. (Other than that you are forcing your users to be careful.)
drop
along columns can be done on a view, so it doesn't matter.dropna
might actually be faster, but probably you have to copy data in both cases.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this function is mostly copied/pasted from
format_data()
with a few optimisations. We could in principle break out the copied over parts to separate methods that are called by both.