Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lattice xgboost, a baseline using lattice parameters, space group number, and unit cell volume #152

Merged
merged 13 commits into from
Jul 27, 2022

Conversation

sgbaird
Copy link
Contributor

@sgbaird sgbaird commented Jun 14, 2022

eXtreme Gradient Boosting trees (XGBoost) is applied on basic tabular data describing the crystal lattice of each compound: lattice parameter lengths and angles, space group number, and unit cell volume. Fixed XGBoost hyperparameters were used. This serves as part of a baseline to answer the question: how much predictive performance is present in the basic details of a crystal lattice (i.e. no composition, no site information)?

This is designed for use on the matbench_mp_e_form task as an alternative perspective in a more established domain (i.e. model accuracy) to that of generative model benchmarking. This is specifically part of a series of baselines and tests related to the xtal2png representation.

sparks-baird/xtal2png#51

Authored primarily by @cseeg

@sgbaird
Copy link
Contributor Author

sgbaird commented Jun 14, 2022

@ardunn tests are failing, seems related to matminer.
e.g.

======================================================================
ERROR: test_has_polymorphs (matbench.tests.test_task.TestMatbenchTask)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/matbench/matbench/matbench/tests/test_task.py", line 464, in test_has_polymorphs
    mbt = MatbenchTask("matbench_steels", autoload=True)
  File "/home/runner/work/matbench/matbench/matbench/task.py", line 89, in __init__
    self.df = load(self.dataset_name) if autoload else None
  File "/home/runner/work/matbench/matbench/matbench/data_ops.py", line 66, in load
    df = load_dataset(dataset_name)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/dataset_retrieval.py", line 66, in load_dataset
    _validate_dataset(
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/utils.py", line 89, in _validate_dataset
    raise UserWarning(
UserWarning: Error, hash of downloaded file does not match that included in metadata, the data may be corrupt or altered
======================================================================
ERROR: test_instantiation (matbench.tests.test_task.TestMatbenchTask)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/matbench/matbench/matbench/tests/test_task.py", line 35, in test_instantiation
    MatbenchTask(ds, autoload=True)
  File "/home/runner/work/matbench/matbench/matbench/task.py", line 89, in __init__
    self.df = load(self.dataset_name) if autoload else None
  File "/home/runner/work/matbench/matbench/matbench/data_ops.py", line 66, in load
    df = load_dataset(dataset_name)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/dataset_retrieval.py", line 66, in load_dataset
    _validate_dataset(
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/utils.py", line 89, in _validate_dataset
    raise UserWarning(
UserWarning: Error, hash of downloaded file does not match that included in metadata, the data may be corrupt or altered
======================================================================
ERROR: test_record (matbench.tests.test_task.TestMatbenchTask)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/matbench/matbench/matbench/tests/test_task.py", line 211, in test_record
    mbt.load()
  File "/home/runner/work/matbench/matbench/matbench/task.py", line 235, in load
    self.df = load(self.dataset_name)
  File "/home/runner/work/matbench/matbench/matbench/data_ops.py", line 66, in load
    df = load_dataset(dataset_name)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/dataset_retrieval.py", line 66, in load_dataset
    _validate_dataset(
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/utils.py", line 89, in _validate_dataset
    raise UserWarning(
UserWarning: Error, hash of downloaded file does not match that included in metadata, the data may be corrupt or altered
----------------------------------------------------------------------
Ran 30 tests in 73.[767](https://github.com/materialsproject/matbench/runs/6874143276?check_suite_focus=true#step:4:768)s

@ardunn
Copy link
Collaborator

ardunn commented Jul 22, 2022

@sgbaird Thanks for the PR! Let me see if I can fix this and I'll merge this in.

@sgbaird
Copy link
Contributor Author

sgbaird commented Jul 25, 2022

Thanks!

@ardunn ardunn merged commit 02fb7ec into materialsproject:main Jul 27, 2022
@ardunn
Copy link
Collaborator

ardunn commented Jul 27, 2022

Merged! Not sure what was going on with the tests, maybe some sort of version issue. I was able to pass all the tests of your branch on my machine so it's probably just some CI problem which I'll debug.

@cseeg
Copy link
Contributor

cseeg commented Jul 27, 2022

Sweet

@sgbaird sgbaird deleted the lattice-xgboost branch July 27, 2022 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants