Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect parsing of badly formatted lattice with 8 entries #6

Closed
jameskermode opened this issue Aug 20, 2021 · 1 comment · Fixed by #7
Closed

Incorrect parsing of badly formatted lattice with 8 entries #6

jameskermode opened this issue Aug 20, 2021 · 1 comment · Fixed by #7

Comments

@jameskermode
Copy link
Member

As reported at libAtoms/ExtXYZ.jl#12, this file contains a typo in the Lattice and should give an error:

[essswb@mnf144 ExtXYZ]$ cat test.xyz
8
Lattice="5.44 0.0 0.0 0.0 5.44 0.0 0.0 0.05.44" Properties=species:S:1:pos:R:3 Time=0.0
Si        0.00000000      0.00000000      0.00000000
Si        1.36000000      1.36000000      1.36000000
Si        2.72000000      2.72000000      0.00000000
Si        4.08000000      4.08000000      1.36000000
Si        2.72000000      0.00000000      2.72000000
Si        4.08000000      1.36000000      4.08000000
Si        0.00000000      2.72000000      2.72000000
Si        1.36000000      4.08000000      4.08000000

However, it is read successfully but gives a strange cell:

[essswb@mnf144 ExtXYZ]$ python3
Python 3.7.4 (default, Mar 26 2020, 11:57:44)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import extxyz
>>> extxyz.read("test.xyz")
Atoms(symbols='Si8', pbc=True, cell=[[5.44, 0.0, 0.0], [0.0, 5.44, 0.0], [0.0, 0.05, 0.44]])

For this to be the case there must be an error in the parser, or even in the grammar.

@bernstei
Copy link
Contributor

bernstei commented Aug 20, 2021

This appears to be a bug in the grammar - the quoted whitespace-separated-float element is a Repeat(Regex..., mi=1)), which allows for a repeat without any whitespace, so it parses 0.050.44 as 0.050 followed by .44, I suspect. These are separated by a wordbreak regex \b. A List(,, delimiter=' ') doesn't seem to work. What does work for parsing is Sequence(Repeat(Regex(re_float+'\s'), mi=0), Regex(re_float))), but that breaks the C and python tree walking code, in different ways, because there's one additional depth nesting of container tree nodes.

I opened a pyleri issue cesbit/pyleri#19 to see whether this can be handled cleanly. The working option above can definitely be made to work, but requires special parsing of this new tree topology. A pyleri.List with whitespace delimiters seems much preferable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants