Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make examples_torch_job faster #27437

Merged
merged 1 commit into from
Nov 10, 2023
Merged

Make examples_torch_job faster #27437

merged 1 commit into from
Nov 10, 2023

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Nov 10, 2023

What does this PR do?

This job sometimes having some test timeout (> 120s.). Even if job passes, the log looks like

115.55s call     examples/pytorch/test_pytorch_examples.py::ExamplesTests::test_run_semantic_segmentation
66.07s call     examples/pytorch/test_pytorch_examples.py::ExamplesTests::test_run_squad
48.24s call     examples/pytorch/test_pytorch_examples.py::ExamplesTests::test_run_speech_recognition_seq2seq

It turns out that setting OMP_NUM_THREADS=1 has a huge impact on this job.

This PR sets OMP_NUM_THREADS=8 to make it run faster. It now looks like

25.78s call     examples/pytorch/test_pytorch_examples.py::ExamplesTests::test_run_semantic_segmentation
22.82s call     examples/pytorch/test_accelerate_examples.py::ExamplesTestsNoTrainer::test_run_swag_no_trainer
20.69s call     examples/pytorch/test_pytorch_examples.py::ExamplesTests::test_run_speech_recognition_seq2seq

Note that setting OMP_NUM_THREADS>1 with pytest -n where n > 1 is going to break things (timeout, blocked etc.).

@@ -397,6 +397,7 @@ def job_name(self):

examples_torch_job = CircleCIJob(
"examples_torch",
additional_env={"OMP_NUM_THREADS": 8},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original value 1 make some test slow and sometimes timeout (> 120 s)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something which is going to make things very hard to debug and cause flaky tests: can we guarantee the same threads will receive the same jobs each time?

Copy link
Collaborator Author

@ydshieh ydshieh Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't mean the jobs will be sent to different thread. It's really low level where a python process (that uses some C module that uses OpenMP) dispatches some (sub)tasks to different threads , I believe.

At the test level, that is controlled by pytest -n x , where is the number of workers (processes) to run different tests. And so far we use pytest -n 8 withtout having same worker will receive the same jobs each time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! :)

@@ -405,6 +406,7 @@ def job_name(self):
"pip install -U --upgrade-strategy eager -r examples/pytorch/_tests_requirements.txt",
"pip install -U --upgrade-strategy eager -e git+https://github.com/huggingface/accelerate@main#egg=accelerate",
],
pytest_num_workers=1,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When OMP_NUM_THREADS>1, we should set this to 1.

(well, since we have dist=loadfile in pytest option and we have only 2 files concerned, we won't see the problems. But let's not risk our future life ...)

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 10, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating!

@ydshieh ydshieh merged commit 7ee995f into main Nov 10, 2023
22 checks passed
@ydshieh ydshieh deleted the check_example_job_p6 branch November 10, 2023 19:05
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants