Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CifData: fast scanning and lazy parsing #1190

Merged

Conversation

ltalirz
Copy link
Member

@ltalirz ltalirz commented Feb 23, 2018

Fix #1183
Fix #1186

Fast scanning

Adds a scan_type attribute to CifData.
Using scan_type='flex' uses C-routines of PyCifRW and is several times faster than the python-based implementation for large CIF files.
The default behavior is unchanged.

Example:

from aiida.orm.data.cif import CifData
a = CifData('/path/to/file.cif', scan_type='flex')

Lazy parsing

Adds a parse_policy attribute to CifData.
Using parse_policy='lazy' delays parsing of CIF files until the values are actually needed.
The default behavior is unchanged, except for the fact that the CIF object is now kept in memory after regular parsing (see #1186) (it will, however, not be written to the DB).

Example:

from aiida.orm.data.cif import CifData
a = CifData('/path/to/file.cif', parse_policy='lazy')

Note: 1st commit contains changes from formatter & linter

formatting + fixing minor issues
Example:
from aiida.orm.data.cif import CifData

a = CifData('/path/to/file.cif', scan_type='flex')

Also add test to check that results are identical.
parse_policy decides whether the CIF file is parsed at creation
time of CifData or whether it is parsed only when needed.

Example:

from aiida.orm.data.cif import CifData
a = CifData(file='/path/to/file.cif', parse_policy='lazy')

(default is 'eager', which leaves the code unchanged)
@ltalirz
Copy link
Member Author

ltalirz commented Feb 23, 2018

One change is that the calls to get_formula and get_spacegroup_numbers have moved from set_file into the constructor.
This means, the behavior of a = CifData('/path/to/file.cif' remains the same, but the behavior of
a.set_file('/path/to/file.cif') changes.

The reason for this change is that otherwise one needs to start fighting against the "cleverness" of AiiDA, which calls all the set_... functions in the constructor of the base class (e.g. one would need to make sure that set_parse_policy to be set before set_file).

I'll add something to set_file in order to prevent inconsistencies.

added more tests...
one is currently failing. need to think about how to conveniently
populate formulae and spacegroups
@ltalirz
Copy link
Member Author

ltalirz commented Feb 23, 2018

Closing this PR until convergence reached.

@ltalirz ltalirz closed this Feb 23, 2018
@ltalirz ltalirz reopened this Feb 23, 2018
@ltalirz ltalirz closed this Feb 23, 2018
same for get_spacegroup_numbers

Instead, add a parse() method that will parse the CIF file, if desired.
@ltalirz ltalirz reopened this Feb 23, 2018
@ltalirz
Copy link
Member Author

ltalirz commented Feb 26, 2018

@giovannipizzi Feedback appreciated :-) You are also welcome to assign someone else, if you don't have the time.

@ltalirz ltalirz removed the request for review from giovannipizzi March 1, 2018 15:54
@ltalirz
Copy link
Member Author

ltalirz commented Mar 1, 2018

@nmounet If you are not available, please unassign yourself and I will ask someone else.

@ltalirz
Copy link
Member Author

ltalirz commented Mar 7, 2018

@giovannipizzi It seems the Nicolas currently has no time.
Should we go ahead with this? I'm happy to add a bit of documentation here as well, if we agree on the implementation.

giovannipizzi
giovannipizzi previously approved these changes Mar 7, 2018
@giovannipizzi
Copy link
Member

Do you know why tests are failing? (Probably when you'll fix them I have to approve again)

@ltalirz
Copy link
Member Author

ltalirz commented Mar 7, 2018

Yes... I removed the main_file_name argument of _prepare_cif because it is not used (the function just returns the cif string).
Now I see that there are many similar functions like this... I guess removing it is the right thing to do, but it might break other code...

Many of the `prepare_...` functions have a `main_file_name` argument.
It's completely unused in CifData, but I'm putting it back for the
moment in order to avoid breaking other people's code.
@giovannipizzi giovannipizzi merged commit 1e72b3a into aiidateam:release_v0.11.1 Mar 7, 2018
@ltalirz ltalirz deleted the issue_1186_flexible_cif_class branch April 12, 2018 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants