-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Usage stats] Record Ray native library usage from a home temp folder #25842
Conversation
# We need a file lock in order to | ||
# Set the short timeout to avoid too long delay in import. | ||
with TempFileLock(str(self.lib_usage_file), timeout=2): | ||
with self.lib_usage_file.open("a+") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use a file lock, just touch separate files for each key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this still can have some corruption (race condition) if there's more than one instance or driver running concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Touch a separate file (don't write anything, just create it), for each library. They can have separate file names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, mirror the kv put strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. That's a great idea. Let me follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed. It seems like Windows doesn't support atomic file creation, so I used Filelock there. For Linux, it seems like PathLib().touch()
uses exclusive open, so it seems to be safe to just use touch(exist_ok=True)
if lib_name not in libs: | ||
f.seek(0, io.SEEK_END) | ||
self._write(f, lib_name) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO add docstring
ray._private.utils.get_ray_temp_dir() | ||
).read() | ||
for library_usage in historical_lib_usages: | ||
if library_usage not in result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overhead should be pretty small as there are only 5 libs
python/ray/util/ml_utils/filelock.py
Outdated
import tempfile | ||
from pathlib import Path | ||
|
||
if sys.platform == "win32": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized the basic FileLock is Unix only
# We need a file lock in order to | ||
# Set the short timeout to avoid too long delay in import. | ||
with TempFileLock(str(self.lib_usage_file), timeout=2): | ||
with self.lib_usage_file.open("a+") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this still can have some corruption (race condition) if there's more than one instance or driver running concurrently.
It will be ready by today! |
Working on fixing the weird test failure + Windows issue |
ray-project#25842 is not needed since we will no longer accidentally create a new cluster while an existing one is running after ray-project#26678 Signed-off-by: Edward Oakes <[email protected]>
ray-project#25842 is not needed since we will no longer accidentally create a new cluster while an existing one is running after ray-project#26678
ray-project#25842 is not needed since we will no longer accidentally create a new cluster while an existing one is running after ray-project#26678 Signed-off-by: elliottower <[email protected]>
Why are these changes needed?
This PR records the historical Ray native library usage to the home temp folder. Note that library usage only includes Ray native libraries (rllib, tune, dataset, workflow, and train). NOTE: The library usage is always recorded to /tmp/ray, but they will only recorded when the cluster that enables the usage stats is running. Note that this can generate quite big amount of false positive (e.g., If I import rllib once, and start cluster for local development, they will all considered as a rllib cluster).
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.