-
Notifications
You must be signed in to change notification settings - Fork 26.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix doctest files fetch issue #23277
Fix doctest files fetch issue #23277
Conversation
The documentation is not available anymore as the PR was closed or merged. |
As said offline I don't think we need to revert urgently, we can just ignore the red check on main. |
"git remote add upstream https://github.com/huggingface/transformers.git && git fetch upstream \n" | ||
"git diff --name-only --relative --diff-filter=AMR refs/remotes/upstream/main...HEAD | grep -E '\.(py|mdx)$' | grep -Ev '^\..*|/\.' | grep -Ev '__' > pr_documentation_tests.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the stuff that probably needs to be different on main vs a PR branch. The test fetcher has this test that can be reused/adapted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problems comes from the fact grep
gives exit code 1
if the input is empty.
circleci@9576b4afebd1:~/transformers$ echo "" | grep -E '\.(mdx)$'
circleci@9576b4afebd1:~/transformers$ echo $?
1
Considering the chain of 3 grep in the 2nd line here, I think it's best to have a python function to do things.
67a269a
to
7ccebe6
Compare
Refactor doctests + add CI
to main
"@@ -432,10 +432,11 @@ def job_name(self): | |||
tests_to_run="tests/repo_utils", | |||
) | |||
|
|||
# At this moment, only the files that are in `utils/documentation_tests.txt` will be kept (together with a dummy file). | |||
py_command = 'import os; import json; fp = open("pr_documentation_tests.txt"); data_1 = fp.read().strip().split("\\n"); fp = open("utils/documentation_tests.txt"); data_2 = fp.read().strip().split("\\n"); to_test = [x for x in data_1 if x in set(data_2)] + ["dummy.py"]; to_test = " ".join(to_test); print(to_test)' | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the logic to some new functions in utils/tests_fetcher.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's cleaner thanks!
@@ -216,6 +216,66 @@ def get_modified_python_files(diff_with_last_commit=False): | |||
return get_diff(repo, repo.head.commit, parent_commits) | |||
|
|||
|
|||
def get_diff_for_py_and_mdx_files(repo, base_commit, commits): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just copies of existing functions with some modifications for doctest test files.
e8ee1c1
to
863dcf5
Compare
@sgugger FYI The doctest PR also has some problem running on our GitHub Actions runner. See error below. _____ ERROR collecting src/transformers/generation/configuration_utils.py ______
import file mismatch:
imported module 'transformers.generation.configuration_utils' has this __file__ attribute:
/transformers/src/transformers/generation/configuration_utils.py
which is not the same as the test file we want to collect:
/__w/transformers/transformers/src/transformers/generation/configuration_utils.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the cleanup! I just have a question on why the dummy.py. Maybe the comment can be more explicit on why it's needed?
@@ -432,10 +432,11 @@ def job_name(self): | |||
tests_to_run="tests/repo_utils", | |||
) | |||
|
|||
# At this moment, only the files that are in `utils/documentation_tests.txt` will be kept (together with a dummy file). | |||
py_command = 'import os; import json; fp = open("pr_documentation_tests.txt"); data_1 = fp.read().strip().split("\\n"); fp = open("utils/documentation_tests.txt"); data_2 = fp.read().strip().split("\\n"); to_test = [x for x in data_1 if x in set(data_2)] + ["dummy.py"]; to_test = " ".join(to_test); print(to_test)' | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's cleaner thanks!
@sgugger A short fix for this issue is given in the last commit. The reason is the file This change should be applied to other workflow file, but it's rare that we have such imports in the codebase. I will do it in a separate PR. |
Thanks for all the explanation! |
Going to merge as the failing tests are irrelevant and I have tried to re-run for a few times. |
* fix --------- Co-authored-by: ydshieh <[email protected]>
* fix --------- Co-authored-by: ydshieh <[email protected]>
* fix --------- Co-authored-by: ydshieh <[email protected]>
Reverts #23271
Embarrassingly and unfortunately, the new job
tests_pr_documientation_tests
fails at the stepGet files to test
when the job is run on themain
branch.https://app.circleci.com/pipelines/github/huggingface/transformers/64235/workflows/54a99003-258e-4c2a-8366-b4461b3ec33f/jobs/794628/parallel-runs/0/steps/0-113
I will have to take a look - the log is not informative.