Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add facilities for crystal structures and polymers #33

Open
sgbaird opened this issue Jan 15, 2022 · 3 comments
Open

add facilities for crystal structures and polymers #33

sgbaird opened this issue Jan 15, 2022 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@sgbaird
Copy link
Member

sgbaird commented Jan 15, 2022

Feature request

Requires valid distance metrics for crystal structures and polymers that encode chemo-structural novelty and polymeric novelty, respectively as well as structure-based regression models. After that, just some basic plumbing.

@sgbaird sgbaird added the enhancement New feature or request label Jan 15, 2022
@sgbaird sgbaird self-assigned this Jan 15, 2022
@sgbaird
Copy link
Member Author

sgbaird commented Oct 3, 2022

Probably makes sense to default to MEGNet for ease of use. @sp8rks mentioned that the Liverpool group has crystal similarity measures that we can use based on an attention network. Ideally that crystal similarity measure would be packaged on PyPI (i.e. pip-installable) and have a function that takes two pymatgen Structure objects and returns a scalar (or takes two lists of pymatgen Structure objects and returns a pairwise distance matrix)

@sgbaird
Copy link
Member Author

sgbaird commented Oct 3, 2022

Some places that need to change:

  • anything that hardcodes CrabNet; for example, the following which would need to be changed to regressor_kwargs instead:
    self.crabnet_kwargs = dict(
    mat_prop=self.mat_prop_name,
    losscurve=False,
    learningcurve=False,
    verbose=self.verbose,
    force_cpu=self.force_cpu,
    epochs=self.epochs,
    )
  • anything that hardcodes ElMD; for example:
    X_train.append(ElMD(comp, metric=self.novelty_prop).feature_vector)
  • probably some other places

Probably best to start by modifying and testing the bare bones example:
https://mat-discover.readthedocs.io/en/latest/examples.html#bare-bones

This is something that a collaborator can modify without knowledge of the mat_discover code architecture.

EDIT: For evaluation metrics, could keep the element metrics and instead of new chemical formula, check if there's a new space group represented. Could also be new space group + new number of sites.

@sgbaird
Copy link
Member Author

sgbaird commented Oct 4, 2022

Based on email discussion:

Taylor brought up some great points, and I think this is an exciting project. There's been a push/encouragement both internal and external to incorporate structure into the search for high-performing, novel materials, and I think this will be a timely extension of DiSCoVeR.

Weighting

how do we weight these? Should it be tunable? Or should it be a fixed ratio of different in composition as well as structure?

For the weighting, perhaps we could use chimera as the scalarizing function. Alternatively, I think it would be interesting/best practice to use these two as separate objectives in a multi-objective optimization via e.g. expected hypervolume improvement. In other words, a mathematically robust way of collapsing multiple objectives in the context of observed data into a single number. Expected hypervolume improvement is taken care of implicitly with most sophisticated multi-objective optimization platforms.

Another option would be using an expected improvement acquisition function, except where the novelty proxy takes the place of uncertainty predictions.

How do we validate performance?

Interesting idea about recognizing new motifs. The structural prototypes from AFLOW seem relevant, since they're going for a set of canonical prototypes IIUC.

Some other issues related to validating performance

Comments on the plumbing to modify for structure:

The easiest place to start testing things out is via the mat_discover bare bones script. Today, I adapted this to use a matbench elasticity dataset with pymatgen Structures, M3GNet instead of CrabNet, and a Euclidean fingerprint-based structural distance instead of ElMD. Everything else is the same.

See the notebook below

https://colab.research.google.com/github/sparks-baird/mat_discover/blob/main/examples/structurally-aware-mat-discover-bare-bones.ipynb

When your structural distance metric of choice is ready, then that can be swapped out with the fingerprint-based structural distance. After that comes the most difficult part - validation (hence Taylor's comments). Validation can proceed in a similar fashion to the original one and/or include some extensions/modifications to how validation is performed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant