Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long validations times, explore using fastjsonschema #190

Closed
goanpeca opened this issue Sep 23, 2020 · 4 comments · Fixed by #191
Closed

Long validations times, explore using fastjsonschema #190

goanpeca opened this issue Sep 23, 2020 · 4 comments · Fixed by #191

Comments

@goanpeca
Copy link
Contributor

goanpeca commented Sep 23, 2020

Hello :-)

On some cases using JLab, when outputs are collected from several nodes (using dask for example) when errors are found, many tracebacks can populate the output of a given cell. In these cases where the output is a large list of tracebacks, the validation step can be significant.

This is a synthetic notebook, but it illustrates the problem.
50000-errors.ipynb.zip

A script to test this.

Read is already doing validation, so that extra call to validation was for testing purposes.

import nbformat
import time

TEST_FILE = '50000-errors.ipynb'


def test():
    as_version = 4
    start_time = time.time()
    print("Start:\t0.00")

    with open(TEST_FILE, 'r', encoding='utf-8') as f:
        model = nbformat.read(f, as_version=as_version)

    print("Open:\t"+ str(round(time.time() - start_time, 2)))

    nbformat.validate(model)
    print("Valid:\t"+ str(round(time.time() - start_time, 2)))


if __name__ == "__main__":
    test()

Yields in seconds:

Start:   0.00
Open:   10.78
Valid:  21.0

Could the use of another validation library like https://github.com/horejsek/python-fastjsonschema be considered to improve the validation performance for cases like the one described?

Thanks!

Pinging @mlucool, @echarles

@MSeal
Copy link
Contributor

MSeal commented Sep 24, 2020

While I'm not opposed to the idea, validating a 33MB notebook begs the question of why is the notebook that large? Other mechanisms start failing for notebooks larger than 10MB (browser crash, transport mechanisms times out, etc).

@goanpeca
Copy link
Contributor Author

@MSeal it was an example to really expose the problem.

notebook begs the question of why is the notebook that large?

That is a different issue.

Other mechanisms start failing for notebooks larger than 10MB (browser crash, transport mechanisms times out, etc).

Also a different issue.

While I'm not opposed to the idea,

Great, I have an opened PR, need to generalize it for the use with other available libraries, and then it would be good for review.

@MSeal
Copy link
Contributor

MSeal commented Sep 24, 2020

Understood. I'll try to take a look -- protip if you reference this issue in the PR it will generate a link between them and post here that it is linked.

@goanpeca
Copy link
Contributor Author

goanpeca commented Sep 24, 2020

if you reference this issue in the PR it will generate a link between them and post here that it is linked.

Yes, I forgot 🙃

Thanks for the feedback @MSeal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants