Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more tests on the pre-commit hook #722

Closed
13 tasks done
mwouts opened this issue Jan 21, 2021 · 15 comments · Fixed by #724
Closed
13 tasks done

Add more tests on the pre-commit hook #722

mwouts opened this issue Jan 21, 2021 · 15 comments · Fixed by #724
Milestone

Comments

@mwouts
Copy link
Owner

mwouts commented Jan 21, 2021

@JohnPaton and @Skylion007 have developed a pre-commit hook for this repo - that was an impressive work #698!

Here is a list of small improvements that we identified during the PR and that I'd like to include in the next release:

  • Add an entry to the CHANGELOG
  • Update the documentation for the --to test to match the example that is actually tested
  • In the documentation, the emphasis should be on the hook for the pre-commit framework. We could move the documentation for the manual pre-commit hook to another section.
  • Split the tests on the pre-commit into multiple files. Again the emphasis should be on the new tests and on the use cases documented in the documentation
  • Refactor the tests to make them easier to read, using fixtures. If possible, test the error text (currently we just test SystemExit which does not contain a detailed error message)
  • Change the jupytext CLI to avoid writing files that don't change (only update their timestamp). And make sure that this fixes the pre-commit hook when run on file types .py + .ipynb (or even tripled-paired notebooks)
  • Test and document a --sync --pipe black mode.
  • Test and document a --sync + nbstripout mode.
  • Test and document a --sync --execute mode.
  • Find a way to safely reformat long markdown cells (without breaking markdown tables...)
  • Test modifying the .py file in the --sync mode
  • Test the hook in the context of pre-commit run --all (no need to add files to the git index in that context)
  • Test moving a paired notebook
@mwouts
Copy link
Owner Author

mwouts commented Jan 21, 2021

@JohnPaton , I have a question for you... I was not sure to get the correct error message when an output is both tracked and modified. Is it correct that we should raise an alert also in that case? I am thinking of this: 5ef5f0c

@JohnPaton
Copy link
Contributor

In that case pre-commit itself will give an appropriate error (something like <file> has been modified) so that's why I didn't include anything explicit for that case.

@mwouts
Copy link
Owner Author

mwouts commented Jan 23, 2021

In that case pre-commit itself will give an appropriate error (something like has been modified) so that's why I didn't include anything explicit for that case.

Yes, that was what I though. I think still I prefer to add a more explicit error message, if you don't mind. Maybe when I'm done with the other items we can do a review and decide whether we should change the name of the --alert-untracked argument or not.

Now let me give a quick update...

  • I have made some progress in the direction of not rewriting the files that don't need an update and that seems to work well
  • However I am experiencing issues with the --sync mode (cf. that branch or this run of the CI: https://github.com/mwouts/jupytext/runs/1754956472?check_suite_focus=true). Namely:
    • If I modify only the test.ipynb file, but add both the test.ipynb and test.py file, the notebook may be overwritten with the outdated version from the test.py file
    • If I modify only the test.ipynb file and add it to git, then try to commit, it fails as expected and updates the .py file. If now I add the .py file to the index and commit, everything goes fine. But if I don't add the .py file and try to commit, then pre-commit warns of Unstaged files which have the effect to revert the .ipynb file to its previous state:
[WARNING] Unstaged files detected.
[INFO] Stashing unstaged files to /home/marc/.cache/pre-commit/patch1611434470.
jupytext.................................................................Failed
- hook id: jupytext
- exit code: 1
- files were modified by this hook

[jupytext] Reading test.ipynb in format ipynb
[jupytext] Loading 'test.py'
[jupytext] Updating 'test.ipynb'
[jupytext] Outdated git index! Please run 'git add test.ipynb'
[jupytext] Updating the timestamp of 'test.py'

[WARNING] Stashed changes conflicted with hook auto-fixes... Rolling back fixes...
[INFO] Restored changes from /home/marc/.cache/pre-commit/patch1611434470.

I think @JohnPaton you were right to focus on the --to mode, it is indeed much safer 😄

Clearly the problem of the --sync mode is with the use of timestamps. John, Aaron, what do you think of the following suggestion? Instead of using timestamps, we could consider that the most up-to-date files are those in the git index. If the index contains a single file (among the pair), Jupytext would update the others files in the pair based on this one. And if multiple paired files are in the index, Jupytext should fail if they contain inconsistent changes, and do nothing otherwise.

@JohnPaton
Copy link
Contributor

I think failing if the changes are inconsistent is a sane choice, this is also what git itself does of course so you can even borrow their conflict language.

How will this affect current users of --sync, especially if they are not using git?

@mwouts
Copy link
Owner Author

mwouts commented Jan 25, 2021

I think failing if the changes are inconsistent is a sane choice, this is also what git itself does of course so you can even borrow their conflict language.

Agreed! Good idea... by the way I was also planning to show (a fraction?) of the conflict itself because otherwise it might be hard to find out what is going on.

How will this affect current users of --sync, especially if they are not using git?

This will not affect the default --sync mode, as it will be triggered by using the pre-commit mode (maybe we could rename the --alert-untracked argument to something like --pre-commit-framework)

@JohnPaton
Copy link
Contributor

Agreed! Shouldn't be a big deal to rename since that option is only used in the context of the hook right now anyway

@mwouts
Copy link
Owner Author

mwouts commented Jan 26, 2021

Hi @JohnPaton , @Skylion007 , a quick update: as discussed above we will now use the inclusion or not in the git index, rather than the timestamp to determine the source notebook when doing --sync --pre-commit-mode. That seems to work well (much better than timestamps)!

Now what remains is to cover a few more use cases (combine with black, combine with nbstripout, call --execute), and I guess we will have a safe release and an interesting blog post!

@Skylion007
Copy link
Contributor

Skylion007 commented Jan 26, 2021 via email

@JohnPaton
Copy link
Contributor

Awesome!

@mwouts
Copy link
Owner Author

mwouts commented Jan 26, 2021

I had more fun with this tonight!

  • I moved each use case to a new test file to improve the readability of tests (not sure however how many people actually read the tests 😄 )
  • --sync and nbstripout work well together, see test_pre_commit_2_sync_nbstripout.py
  • --sync and nbstripout and black also seem to interact well, however as @JohnPaton previously noticed black needs to be present in jupytext --pipe, cf test_pre_commit_3_sync_black_nbstripout.py
  • maybe I'd like an example of a hook that does not commit the ipynb file, but I don't see how it could work... is it possible for a hook to reset files in the index?
  • I still have to handle a side effect in test 3 - the global pairing causes a modification on the py file the second time the hook runs, see the TODOs there
  • --execute almost works. Thanks to the new --diff argument that displays the change when the file has to be rewritten, we clearly see that this is because of execution timestamps... Do you agree that in --execute --pre-commit-mode we can consider that the notebook has not changed if only the execution timestamps change?

@JohnPaton
Copy link
Contributor

JohnPaton commented Jan 27, 2021

maybe I'd like an example of a hook that does not commit the ipynb file, but I don't see how it could work... is it possible for a hook to reset files in the index?

It's possible but discouraged, basically. Pre-commit themselves are very adamant that the framework should never touch the staging area.

Do you agree that in --execute --pre-commit-mode we can consider that the notebook has not changed if only the execution timestamps change?

I do agree! If the inputs and outputs are all the same, then the notebook hasn't changed imo

@mwouts
Copy link
Owner Author

mwouts commented Jan 31, 2021

It's possible but discouraged, basically.

Ok I see, thanks.

I do agree! If the inputs and outputs are all the same, then the notebook hasn't changed imo

What do you think of skipping the execution if the inputs have not changed, and if the outputs are present on all code cells (and in the correct order)? This way, we don't need to execute twice before committing: 934a2f1

@JohnPaton
Copy link
Contributor

Sorry for radio silence here - my wife had a baby 😁

@Skylion007
Copy link
Contributor

Congrats!

@mwouts
Copy link
Owner Author

mwouts commented Feb 8, 2021

Wow! Congrats John!

Well as you see we finally were able to ship the pre-commit hook in Jupytext v1.10.0. I have made no announcement yet, in part because I wanted to finish the blog post, and also because I wanted to find the appropriate approach for the new cell ids in nbformat 5.1.1 (everything seems fine in the end if one does either --update or --sync, from Jupytext 1.7.1 on).

I hope to come back to you both soon with more updates on the article, but that might take me a few more days/weeks... Slow progresses, but this is a very nice feature, it is worth documenting it well, isn't it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants