-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up rm bundles #4312
Speed up rm bundles #4312
Conversation
Speedup on dev for deleting bundles: |
Note: the reason it was so slow prior is because it would recompute the user disk in entirety each time a bundle was removed (_get_disk_used(user_id) was called, which searches for all bundles owned by the user). To alleviate this, we now subtract the sum of the sizes of the deleted bundles from the user disk used instead, which is much faster. |
…lab-worksheets into fix/4217-speed-up-rm-bundle
codalab/model/bundle_model.py
Outdated
@@ -2708,6 +2708,14 @@ def increment_user_time_used(self, user_id, amount): | |||
user_info['time_used'] += amount | |||
self.update_user_info(user_info) | |||
|
|||
def increment_user_disk_used(self, user_id, amount): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type hints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
codalab/rest/bundles.py
Outdated
# OK, now let's add our change. | ||
bundle_data_sizes = local.model.get_bundle_metadata(relevant_uuids, 'data_size') | ||
local.model.increment_user_disk_used( | ||
request.user.user_id, (-1) * sum(map(int, bundle_data_sizes.values())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just do -sum...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified this to use something different anyways
codalab/rest/bundles.py
Outdated
local.model.update_user_disk_used(request.user.user_id) | ||
# Just decrement the user disk used by the sum of sizes of bundles deleted | ||
# OK, now let's add our change. | ||
bundle_data_sizes = local.model.get_bundle_metadata(relevant_uuids, 'data_size') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this diff produce the same answer as the above? Just wondering if there are subtle differences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did not, it turns out, due to the following edge cases:
- cl rm -d: Here, the bundle contents are removed but not the bundle metadata. For a given bundle B, we must take care to decrement the user disk quota when cl rm -d B is run, but to not decrement it again when cl rm B is run (since that second cl rm just deletes the metadata and not anything from disk).
- Files uploaded with symlinks: These shouldn't affect user disk quota regardless, so when we remove them there should be no decrement to the user disk quota.
The new changes fix this.
codalab/rest/bundles.py
Outdated
@@ -1314,7 +1315,12 @@ def delete_bundles(uuids, force, recursive, data_only, dry_run): | |||
local.model.delete_bundles(relevant_uuids) | |||
|
|||
# Update user statistics | |||
local.model.update_user_disk_used(request.user.user_id) | |||
# Just decrement the user disk used by the sum of sizes of bundles deleted | |||
# OK, now let's add our change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this comment line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, very nice! Wonder if we could add a test for this.
codalab/rest/bundles.py
Outdated
local.model.update_user_disk_used(request.user.user_id) | ||
# Just decrement the user disk used by the sum of sizes of bundles deleted | ||
# OK, now let's add our change. | ||
bundle_data_sizes = local.model.get_bundle_metadata(relevant_uuids, 'data_size') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also check the case where some bundles that are being deleted don't have data (e.g., a bundle which has had cl rm -d ...
done on it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked that case with the new tests.
codalab/lib/path_util.py
Outdated
@@ -310,20 +310,24 @@ def remove(path): | |||
|
|||
if not FileSystems.exists(path): | |||
FileSystems.delete([path]) | |||
return | |||
return True # not sure about this one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return True
look good to me
codalab/lib/path_util.py
Outdated
elif os.path.isdir(path): | ||
try: | ||
shutil.rmtree(path) | ||
return True | ||
except shutil.Error: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an error, it will have no return value here? Should we use finally
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I added a return False statement at the end of the function to catch any other cases.
I made some changes to catch edge cases. Previously, there were no test cases that I saw for verifying that disk quota was incremented and decremented properly upon file upload and deletion. So, I added test cases to check that disk quota is incremented and decremented properly upon:
This helps to verify that my rm modification still runs correctly and to verify that our code handles disk quota usage correctly in various cases. |
Now ready for final review @percyliang @wwwjn Thanks for the reviews! |
…ix/4217-speed-up-rm-bundle
…lab-worksheets into fix/4217-speed-up-rm-bundle
…lab-worksheets into fix/4217-speed-up-rm-bundle
Reasons for making this change
Speed up rm bundles since it's very slow on the Stanford instance (and is slow in general on any instance that has lots of bundles).
Related issues
#4217