Random failures in tests due to duplicate cell ids #747

mwouts · 2021-02-19T21:05:31Z

I've seen already two failures of the CI caused by this. Either we have too many tests and cells without an id, or the probability to get the same id twice is larger than 1 in 5073324.

Maybe we should find a way to initialize the cell id generator to make sure we don't reach this issue in the tests?

In the latest occurrence the stack trace is:

=================================== FAILURES ===================================
_ test_ipynb_to_md[/home/runner/work/jupytext/jupytext/tests/notebooks/ipynb_scheme/Reference Guide for Calysto Scheme.ipynb] _

nb_file = '/home/runner/work/jupytext/jupytext/tests/notebooks/ipynb_scheme/Reference Guide for Calysto Scheme.ipynb'
no_jupytext_version_number = None

    @pytest.mark.parametrize("nb_file", list_notebooks("ipynb_all", skip=""))
    def test_ipynb_to_md(nb_file, no_jupytext_version_number):
>       assert_conversion_same_as_mirror(nb_file, "md", "ipynb_to_md")

/home/runner/work/jupytext/jupytext/tests/test_mirror.py:123: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/home/runner/work/jupytext/jupytext/tests/test_mirror.py:84: in assert_conversion_same_as_mirror
    nb_mirror = jupytext.read(mirror_file, fmt=fmt)
/home/runner/work/jupytext/jupytext/jupytext/jupytext.py:412: in read
    return read(stream, as_version=as_version, fmt=fmt, **kwargs)
/home/runner/work/jupytext/jupytext/jupytext/jupytext.py:421: in read
    return reads(fp.read(), fmt, **kwargs)
/home/runner/work/jupytext/jupytext/jupytext/jupytext.py:374: in reads
    notebook = reader.reads(text, **kwargs)
/home/runner/work/jupytext/jupytext/jupytext/jupytext.py:154: in reads
    return new_notebook(cells=cells, metadata=metadata)
/opt/hostedtoolcache/Python/3.8.7/x64/lib/python3.8/site-packages/nbformat/v4/nbbase.py:164: in new_notebook
    validate(nb)
/opt/hostedtoolcache/Python/3.8.7/x64/lib/python3.8/site-packages/nbformat/v4/nbbase.py:39: in validate
    return validate(node, ref=ref, version=nbformat)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

nbdict = {'nbformat': 4, 'nbformat_minor': 5, 'metadata': {'kernelspec': {'display_name': 'Calysto Scheme (Python)', 'language'...cheme Language](Calysto%20Scheme%20Language.ipynb) for more details on the Calysto Scheme language.", 'metadata': {}}]}
ref = None, version = 4, version_minor = 5, relax_add_props = False
nbjson = None

    def validate(nbdict=None, ref=None, version=None, version_minor=None,
                 relax_add_props=False, nbjson=None):
        """Checks whether the given notebook dict-like object
        conforms to the relevant notebook format schema.
    
    
        Raises ValidationError if not valid.
        """
    
        # backwards compatibility for nbjson argument
        if nbdict is not None:
            pass
        elif nbjson is not None:
            nbdict = nbjson
        else:
            raise TypeError("validate() missing 1 required argument: 'nbdict'")
    
        if ref is None:
            # if ref is not specified, we have a whole notebook, so we can get the version
            nbdict_version, nbdict_version_minor = get_version(nbdict)
            if version is None:
                version = nbdict_version
            if version_minor is None:
                version_minor = nbdict_version_minor
        else:
            # if ref is specified, and we don't have a version number, assume we're validating against 1.0
            if version is None:
                version, version_minor = 1, 0
    
        if ref is None and version >= 4 and version_minor >= 5:
            # if we support cell ids ensure default ids are provided
            for cell in nbdict['cells']:
                if 'id' not in cell:
                    # Generate cell ids if any are missing
                    cell['id'] = generate_corpus_id()
    
        for error in iter_validate(nbdict, ref=ref, version=version,
                                   version_minor=version_minor,
                                   relax_add_props=relax_add_props):
            raise error
    
        if ref is None and version >= 4 and version_minor >= 5:
            # if we support cell ids check for uniqueness when validating the whole notebook
            seen_ids = set()
            for cell in nbdict['cells']:
                cell_id = cell['id']
                if cell_id in seen_ids:
                    cell['id'] = generate_corpus_id()
                    # Best effort to repair if we find a duplicate id in case the ValidationError isn't blocking
>                   raise ValidationError("Non-unique cell id '{}' detected. Corrected to '{}'.".format(cell_id, cell['id']))
E                   jsonschema.exceptions.ValidationError: Non-unique cell id 'changing-upgrade' detected. Corrected to 'divine-manchester'.

The text was updated successfully, but these errors were encountered:

mwouts mentioned this issue Mar 5, 2021

Use non-random cell ids in tests #751

Merged

mwouts closed this as completed in #751 Mar 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random failures in tests due to duplicate cell ids #747

Random failures in tests due to duplicate cell ids #747

mwouts commented Feb 19, 2021

Random failures in tests due to duplicate cell ids #747

Random failures in tests due to duplicate cell ids #747

Comments

mwouts commented Feb 19, 2021