Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random failures in tests due to duplicate cell ids #747

Closed
mwouts opened this issue Feb 19, 2021 · 0 comments · Fixed by #751
Closed

Random failures in tests due to duplicate cell ids #747

mwouts opened this issue Feb 19, 2021 · 0 comments · Fixed by #751

Comments

@mwouts
Copy link
Owner

mwouts commented Feb 19, 2021

I've seen already two failures of the CI caused by this. Either we have too many tests and cells without an id, or the probability to get the same id twice is larger than 1 in 5073324.

Maybe we should find a way to initialize the cell id generator to make sure we don't reach this issue in the tests?

In the latest occurrence the stack trace is:

=================================== FAILURES ===================================
_ test_ipynb_to_md[/home/runner/work/jupytext/jupytext/tests/notebooks/ipynb_scheme/Reference Guide for Calysto Scheme.ipynb] _

nb_file = '/home/runner/work/jupytext/jupytext/tests/notebooks/ipynb_scheme/Reference Guide for Calysto Scheme.ipynb'
no_jupytext_version_number = None

    @pytest.mark.parametrize("nb_file", list_notebooks("ipynb_all", skip=""))
    def test_ipynb_to_md(nb_file, no_jupytext_version_number):
>       assert_conversion_same_as_mirror(nb_file, "md", "ipynb_to_md")

/home/runner/work/jupytext/jupytext/tests/test_mirror.py:123: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/home/runner/work/jupytext/jupytext/tests/test_mirror.py:84: in assert_conversion_same_as_mirror
    nb_mirror = jupytext.read(mirror_file, fmt=fmt)
/home/runner/work/jupytext/jupytext/jupytext/jupytext.py:412: in read
    return read(stream, as_version=as_version, fmt=fmt, **kwargs)
/home/runner/work/jupytext/jupytext/jupytext/jupytext.py:421: in read
    return reads(fp.read(), fmt, **kwargs)
/home/runner/work/jupytext/jupytext/jupytext/jupytext.py:374: in reads
    notebook = reader.reads(text, **kwargs)
/home/runner/work/jupytext/jupytext/jupytext/jupytext.py:154: in reads
    return new_notebook(cells=cells, metadata=metadata)
/opt/hostedtoolcache/Python/3.8.7/x64/lib/python3.8/site-packages/nbformat/v4/nbbase.py:164: in new_notebook
    validate(nb)
/opt/hostedtoolcache/Python/3.8.7/x64/lib/python3.8/site-packages/nbformat/v4/nbbase.py:39: in validate
    return validate(node, ref=ref, version=nbformat)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

nbdict = {'nbformat': 4, 'nbformat_minor': 5, 'metadata': {'kernelspec': {'display_name': 'Calysto Scheme (Python)', 'language'...cheme Language](Calysto%20Scheme%20Language.ipynb) for more details on the Calysto Scheme language.", 'metadata': {}}]}
ref = None, version = 4, version_minor = 5, relax_add_props = False
nbjson = None

    def validate(nbdict=None, ref=None, version=None, version_minor=None,
                 relax_add_props=False, nbjson=None):
        """Checks whether the given notebook dict-like object
        conforms to the relevant notebook format schema.
    
    
        Raises ValidationError if not valid.
        """
    
        # backwards compatibility for nbjson argument
        if nbdict is not None:
            pass
        elif nbjson is not None:
            nbdict = nbjson
        else:
            raise TypeError("validate() missing 1 required argument: 'nbdict'")
    
        if ref is None:
            # if ref is not specified, we have a whole notebook, so we can get the version
            nbdict_version, nbdict_version_minor = get_version(nbdict)
            if version is None:
                version = nbdict_version
            if version_minor is None:
                version_minor = nbdict_version_minor
        else:
            # if ref is specified, and we don't have a version number, assume we're validating against 1.0
            if version is None:
                version, version_minor = 1, 0
    
        if ref is None and version >= 4 and version_minor >= 5:
            # if we support cell ids ensure default ids are provided
            for cell in nbdict['cells']:
                if 'id' not in cell:
                    # Generate cell ids if any are missing
                    cell['id'] = generate_corpus_id()
    
        for error in iter_validate(nbdict, ref=ref, version=version,
                                   version_minor=version_minor,
                                   relax_add_props=relax_add_props):
            raise error
    
        if ref is None and version >= 4 and version_minor >= 5:
            # if we support cell ids check for uniqueness when validating the whole notebook
            seen_ids = set()
            for cell in nbdict['cells']:
                cell_id = cell['id']
                if cell_id in seen_ids:
                    cell['id'] = generate_corpus_id()
                    # Best effort to repair if we find a duplicate id in case the ValidationError isn't blocking
>                   raise ValidationError("Non-unique cell id '{}' detected. Corrected to '{}'.".format(cell_id, cell['id']))
E                   jsonschema.exceptions.ValidationError: Non-unique cell id 'changing-upgrade' detected. Corrected to 'divine-manchester'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant