Incorrect parsing of badly formatted lattice with 8 entries #6

jameskermode · 2021-08-20T12:14:57Z

As reported at libAtoms/ExtXYZ.jl#12, this file contains a typo in the Lattice and should give an error:

[essswb@mnf144 ExtXYZ]$ cat test.xyz
8
Lattice="5.44 0.0 0.0 0.0 5.44 0.0 0.0 0.05.44" Properties=species:S:1:pos:R:3 Time=0.0
Si        0.00000000      0.00000000      0.00000000
Si        1.36000000      1.36000000      1.36000000
Si        2.72000000      2.72000000      0.00000000
Si        4.08000000      4.08000000      1.36000000
Si        2.72000000      0.00000000      2.72000000
Si        4.08000000      1.36000000      4.08000000
Si        0.00000000      2.72000000      2.72000000
Si        1.36000000      4.08000000      4.08000000

However, it is read successfully but gives a strange cell:

[essswb@mnf144 ExtXYZ]$ python3
Python 3.7.4 (default, Mar 26 2020, 11:57:44)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import extxyz
>>> extxyz.read("test.xyz")
Atoms(symbols='Si8', pbc=True, cell=[[5.44, 0.0, 0.0], [0.0, 5.44, 0.0], [0.0, 0.05, 0.44]])

For this to be the case there must be an error in the parser, or even in the grammar.

bernstei · 2021-08-20T14:15:11Z

This appears to be a bug in the grammar - the quoted whitespace-separated-float element is a Repeat(Regex..., mi=1)), which allows for a repeat without any whitespace, so it parses 0.050.44 as 0.050 followed by .44, I suspect. These are separated by a wordbreak regex \b. A List(,, delimiter=' ') doesn't seem to work. What does work for parsing is Sequence(Repeat(Regex(re_float+'\s'), mi=0), Regex(re_float))), but that breaks the C and python tree walking code, in different ways, because there's one additional depth nesting of container tree nodes.

I opened a pyleri issue cesbit/pyleri#19 to see whether this can be handled cleanly. The working option above can definitely be made to work, but requires special parsing of this new tree topology. A pyleri.List with whitespace delimiters seems much preferable.

bernstei mentioned this issue Aug 25, 2021

Fix parsing of floats that run together in a backward compatible array #7

Merged

jameskermode closed this as completed in #7 Sep 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect parsing of badly formatted lattice with 8 entries #6

Incorrect parsing of badly formatted lattice with 8 entries #6

jameskermode commented Aug 20, 2021

bernstei commented Aug 20, 2021 •

edited

Loading

Incorrect parsing of badly formatted lattice with 8 entries #6

Incorrect parsing of badly formatted lattice with 8 entries #6

Comments

jameskermode commented Aug 20, 2021

bernstei commented Aug 20, 2021 • edited Loading

bernstei commented Aug 20, 2021 •

edited

Loading