-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up disk quota computation #4354
Conversation
…, which is mcuh faster
TODO @AndrewJGaut : Add benchmarking |
…h the data_size stuff. I'm not surprised that it's hard to predict for a running bundle, but I am surprised that something weird happens when uploading two things at once. The bundle size isn't the sum of the two archives. Very weird
@@ -470,8 +472,6 @@ def upload_to_bundle_store( | |||
params={'success': False, 'error_msg': f'Bypass server upload error. {err}',}, | |||
) | |||
raise err | |||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't see it in this abridged view, but if you look at the full file the code looks ilke:
try:
....
self._client.update_bundle_state(bundle['id'], params={'success': True})
except Exception as err:
...
else:
self._client.update_bundle_state(bundle['id'], params={'success': True})
What does this mean? Well, the 'else' branch triggers in a try-catch block when no error is thrown. In other words, when no errors was thrown, the self._client.update_bundle_state(bundle['id'], params={'success': True})
call was triggered twice. This wasn't an issue (except for efficiency reasons) before since the update wasn't incremental (it just reset the whole value), but it was for me since the disk increment happened twice.
) | ||
_run_command([cl, 'rm', uuid]) | ||
check_equals(disk_used, _run_command([cl, 'uinfo', 'codalab', '-f', 'disk_used'])) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these test code paths end up calling cleanup_existing_contents
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, shoot. No, I don't think so. Let me add one more test for that before I merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that!
cd195d3
to
47a8b47
Compare
Reasons for making this change
Current method of updating disk quota is slow. Basically, currently the disk quota is updated using the following function:
and then setting the disk quota to the returned value. However, doing such a search is quite slow, and so we want to do it faster. This PR speeds up the computation and just increments the user's disk quota each time by the correct amount.
This PR also adds a bunch of disk quota tests.
Related issues
#4350 and #4351