Long validations times, explore using fastjsonschema #190

goanpeca · 2020-09-23T15:54:30Z

Hello :-)

On some cases using JLab, when outputs are collected from several nodes (using dask for example) when errors are found, many tracebacks can populate the output of a given cell. In these cases where the output is a large list of tracebacks, the validation step can be significant.

This is a synthetic notebook, but it illustrates the problem.
50000-errors.ipynb.zip

A script to test this.

Read is already doing validation, so that extra call to validation was for testing purposes.

import nbformat
import time

TEST_FILE = '50000-errors.ipynb'


def test():
    as_version = 4
    start_time = time.time()
    print("Start:\t0.00")

    with open(TEST_FILE, 'r', encoding='utf-8') as f:
        model = nbformat.read(f, as_version=as_version)

    print("Open:\t"+ str(round(time.time() - start_time, 2)))

    nbformat.validate(model)
    print("Valid:\t"+ str(round(time.time() - start_time, 2)))


if __name__ == "__main__":
    test()

Yields in seconds:

Start:   0.00
Open:   10.78
Valid:  21.0

Could the use of another validation library like https://github.com/horejsek/python-fastjsonschema be considered to improve the validation performance for cases like the one described?

Thanks!

Pinging @mlucool, @echarles

The text was updated successfully, but these errors were encountered:

MSeal · 2020-09-24T18:05:13Z

While I'm not opposed to the idea, validating a 33MB notebook begs the question of why is the notebook that large? Other mechanisms start failing for notebooks larger than 10MB (browser crash, transport mechanisms times out, etc).

goanpeca · 2020-09-24T18:23:41Z

@MSeal it was an example to really expose the problem.

notebook begs the question of why is the notebook that large?

That is a different issue.

Other mechanisms start failing for notebooks larger than 10MB (browser crash, transport mechanisms times out, etc).

Also a different issue.

While I'm not opposed to the idea,

Great, I have an opened PR, need to generalize it for the use with other available libraries, and then it would be good for review.

MSeal · 2020-09-24T18:28:45Z

Understood. I'll try to take a look -- protip if you reference this issue in the PR it will generate a link between them and post here that it is linked.

goanpeca · 2020-09-24T18:30:47Z

if you reference this issue in the PR it will generate a link between them and post here that it is linked.

Yes, I forgot 🙃

Thanks for the feedback @MSeal.

goanpeca mentioned this issue Sep 24, 2020

PR: Use fastjsonschema if installed and add tests #191

Merged

bollwyvl mentioned this issue Sep 25, 2020

nbformat asynchronous API? #194

Open

MSeal closed this as completed in #191 Oct 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long validations times, explore using fastjsonschema #190

Long validations times, explore using fastjsonschema #190

goanpeca commented Sep 23, 2020 •

edited

Loading

MSeal commented Sep 24, 2020

goanpeca commented Sep 24, 2020

MSeal commented Sep 24, 2020

goanpeca commented Sep 24, 2020 •

edited

Loading

Long validations times, explore using fastjsonschema #190

Long validations times, explore using fastjsonschema #190

Comments

goanpeca commented Sep 23, 2020 • edited Loading

MSeal commented Sep 24, 2020

goanpeca commented Sep 24, 2020

MSeal commented Sep 24, 2020

goanpeca commented Sep 24, 2020 • edited Loading

goanpeca commented Sep 23, 2020 •

edited

Loading

goanpeca commented Sep 24, 2020 •

edited

Loading