Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make examples_torch_job faster #27437

Merged
merged 1 commit into from
Nov 10, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .circleci/create_circleci_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -397,6 +397,7 @@ def job_name(self):

examples_torch_job = CircleCIJob(
"examples_torch",
additional_env={"OMP_NUM_THREADS": 8},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original value 1 make some test slow and sometimes timeout (> 120 s)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something which is going to make things very hard to debug and cause flaky tests: can we guarantee the same threads will receive the same jobs each time?

Copy link
Collaborator Author

@ydshieh ydshieh Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't mean the jobs will be sent to different thread. It's really low level where a python process (that uses some C module that uses OpenMP) dispatches some (sub)tasks to different threads , I believe.

At the test level, that is controlled by pytest -n x , where is the number of workers (processes) to run different tests. And so far we use pytest -n 8 withtout having same worker will receive the same jobs each time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! :)

cache_name="torch_examples",
install_steps=[
"sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng",
Expand All @@ -405,6 +406,7 @@ def job_name(self):
"pip install -U --upgrade-strategy eager -r examples/pytorch/_tests_requirements.txt",
"pip install -U --upgrade-strategy eager -e git+https://github.com/huggingface/accelerate@main#egg=accelerate",
],
pytest_num_workers=1,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When OMP_NUM_THREADS>1, we should set this to 1.

(well, since we have dist=loadfile in pytest option and we have only 2 files concerned, we won't see the problems. But let's not risk our future life ...)

)


Expand Down
Loading