-
Notifications
You must be signed in to change notification settings - Fork 26.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make examples_torch_job
faster
#27437
Conversation
@@ -397,6 +397,7 @@ def job_name(self): | |||
|
|||
examples_torch_job = CircleCIJob( | |||
"examples_torch", | |||
additional_env={"OMP_NUM_THREADS": 8}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original value 1
make some test slow and sometimes timeout (> 120 s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like something which is going to make things very hard to debug and cause flaky tests: can we guarantee the same threads will receive the same jobs each time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't mean the jobs will be sent to different thread. It's really low level where a python process (that uses some C module that uses OpenMP) dispatches some (sub)tasks to different threads , I believe.
At the test level
, that is controlled by pytest -n x
, where is the number of workers (processes) to run different tests. And so far we use pytest -n 8
withtout having same worker will receive the same jobs each time
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! :)
@@ -405,6 +406,7 @@ def job_name(self): | |||
"pip install -U --upgrade-strategy eager -r examples/pytorch/_tests_requirements.txt", | |||
"pip install -U --upgrade-strategy eager -e git+https://github.com/huggingface/accelerate@main#egg=accelerate", | |||
], | |||
pytest_num_workers=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When OMP_NUM_THREADS>1
, we should set this to 1
.
(well, since we have dist=loadfile
in pytest
option and we have only 2 files concerned, we won't see the problems. But let's not risk our future life ...)
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating!
fix Co-authored-by: ydshieh <[email protected]>
What does this PR do?
This job sometimes having some test timeout (> 120s.). Even if job passes, the log looks like
It turns out that setting
OMP_NUM_THREADS=1
has a huge impact on this job.This PR sets
OMP_NUM_THREADS=8
to make it run faster. It now looks likeNote that setting
OMP_NUM_THREADS>1
withpytest -n
wheren > 1
is going to break things (timeout, blocked etc.).