diff --git a/notebooks/2.0-materials-project-feature-ranges.ipynb b/notebooks/2.0-materials-project-feature-ranges.ipynb index 4b2d46a..2139fbf 100644 --- a/notebooks/2.0-materials-project-feature-ranges.ipynb +++ b/notebooks/2.0-materials-project-feature-ranges.ipynb @@ -1,73 +1,60 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "name": "2.0-materials-project-ranges.ipynb", - "provenance": [], - "collapsed_sections": [], - "authorship_tag": "ABX9TyOBcM+CPIOs/3QjWt9ZNGeg", - "include_colab_link": true - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } - }, "cells": [ { "cell_type": "markdown", "metadata": { - "id": "view-in-github", - "colab_type": "text" + "colab_type": "text", + "id": "view-in-github" }, "source": [ - "\"Open" + "\"Open" ] }, { "cell_type": "markdown", - "source": [ - "# Selecting Parameter Ranges via Materials Project" - ], "metadata": { "id": "n1lGJDVfF7Ph" - } + }, + "source": [ + "# Selecting Parameter Ranges via Materials Project" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "HgEhwhoTGDXY" + }, "source": [ "In this notebook, we'll go over how we selected parameter ranges for some hyperparameters of `xtal2png`, namely the lower and upper bounds of lattice parameter lengths ($a$, $b$, and $c$), cell volume, and site pairwise distances.\n", "\n", "After we've downloaded the data from Materials Project (or loaded it if running the notebook again), we'll extract the parameters from each of the compounds and do some exploratory data analysis. Based on the analysis, we choose to use a quantile as an upper bound on the parameter ranges in order to get rid of outliers. By removing the highest 1% in each parameter category, we retain 97% of the data with fewer than 52 sites. This gives us our final parameter ranges. Finally, we make publication-ready histogram figures and save these." - ], - "metadata": { - "id": "HgEhwhoTGDXY" - } + ] }, { "cell_type": "markdown", - "source": [ - "## Setup" - ], "metadata": { "id": "BhFVzV6RG19u" - } + }, + "source": [ + "## Setup" + ] }, { "cell_type": "markdown", - "source": [ - "Let's keep this notebook compatible both as a Google Colab notebook and running locally as a Jupyter notebook." - ], "metadata": { "id": "0WNd5zah_kN_" - } + }, + "source": [ + "Let's keep this notebook compatible both as a Google Colab notebook and running locally as a Jupyter notebook." + ] }, { "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "jcnLBa4J4-GS" + }, + "outputs": [], "source": [ "from os import path\n", "try:\n", @@ -77,12 +64,7 @@ "except:\n", " IN_COLAB = False\n", " base_dir = path.join(\"data\", \"external\")" - ], - "metadata": { - "id": "jcnLBa4J4-GS" - }, - "execution_count": 6, - "outputs": [] + ] }, { "cell_type": "code", @@ -96,8 +78,8 @@ }, "outputs": [ { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", "Collecting pymatgen\n", @@ -174,24 +156,27 @@ }, { "cell_type": "markdown", - "source": [ - "## Data" - ], "metadata": { "id": "ZzTF5zHgG86G" - } + }, + "source": [ + "## Data" + ] }, { "cell_type": "markdown", - "source": [ - "### Materials Project API Key" - ], "metadata": { "id": "LBmnjDuh_tVy" - } + }, + "source": [ + "### Materials Project API Key" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "eUcufUUb1cbI" + }, "source": [ "Get your [Materials Project API key](https://next-gen.materialsproject.org/api) from a file that you store in your Google Drive (see below) or current directory (`.`), or specify it manually by setting the `api_key` variable in the form field or by running in a local miniconda command prompt with an environment activated that has `pymatgen` installed: `pmg config --add PMG_MAPI_KEY `, e.g. `pmg config --add PMG_MAPI_KEY 123abc456def`. For the latter option, see the [`pymatgen` docs](https://pymatgen.org/usage.html#setting-the-pmg-mapi-key-in-the-config-file).\n", "\n", @@ -202,13 +187,27 @@ "}\n", "```\n", "Note that this file is not necessary locally if you use the `pmg config` option above." - ], - "metadata": { - "id": "eUcufUUb1cbI" - } + ] }, { "cell_type": "code", + "execution_count": 4, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "vEGslJYp0kf2", + "outputId": "549bcc56-5948-47c3-c830-66bae35a51e4" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n" + ] + } + ], "source": [ "import json\n", "if IN_COLAB:\n", @@ -230,53 +229,41 @@ "else:\n", " api_key = None\n", " print(\"make sure that you have run `pmg config --add PMG_MAPI_KEY `\")" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "vEGslJYp0kf2", - "outputId": "549bcc56-5948-47c3-c830-66bae35a51e4" - }, - "execution_count": 4, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n" - ] - } ] }, { "cell_type": "markdown", + "metadata": { + "id": "oc8WaGAEA519" + }, "source": [ "### Download\n", "\n", "Let's either download the data directly from Materials Project using the `MPRester` API or load the data that's been saved previously to your device as `structures.pkl` in your `base_dir`." - ], - "metadata": { - "id": "oc8WaGAEA519" - } + ] }, { "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "tkDcY0Oh2YJb" + }, + "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from tqdm import tqdm\n", "import pickle\n", "from pymatgen.ext.matproj import MPRester" - ], - "metadata": { - "id": "tkDcY0Oh2YJb" - }, - "execution_count": 5, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "QWNv43xaGynE" + }, + "outputs": [], "source": [ "pkl_path = path.join(base_dir, \"structures.pkl\")\n", "try:\n", @@ -293,62 +280,29 @@ " with open(pkl_path, \"wb\") as f:\n", " pickle.dump(results, f)\n", " pass" - ], - "metadata": { - "id": "QWNv43xaGynE" - }, - "execution_count": 7, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "### Extract Lattice and Distances" - ], "metadata": { "id": "D5EWnoqvBFfU" - } + }, + "source": [ + "### Extract Lattice and Distances" + ] }, { "cell_type": "markdown", - "source": [ - "From here, we'll loop through each of the structures and grab the lattice parameter lengths (`a`, `b`, and `c`) as well as the cell volume (`volume`) and pairwise distance matrices between each of the sites for a given structure (`distance`)." - ], "metadata": { "id": "I3kILy6DBNC1" - } + }, + "source": [ + "From here, we'll loop through each of the structures and grab the lattice parameter lengths (`a`, `b`, and `c`) as well as the cell volume (`volume`) and pairwise distance matrices between each of the sites for a given structure (`distance`)." + ] }, { "cell_type": "code", - "source": [ - "a = []\n", - "b = []\n", - "c = []\n", - "volume = []\n", - "distance = []\n", - "\n", - "for s in tqdm(results):\n", - " s = s[\"structure\"]\n", - " lattice = s.lattice\n", - " a.append(lattice.a)\n", - " b.append(lattice.b)\n", - " c.append(lattice.c)\n", - " volume.append(lattice.volume)\n", - " distance.append(s.distance_matrix)\n", - "\n", - "print('range of a is: ', min(a), '-', max(a))\n", - "print('range of b is: ', min(b), '-', max(b))\n", - "print('range of c is: ', min(c), '-', max(c))\n", - "print('range of volume is: ', min(volume), '-', max(volume))\n", - "\n", - "dis_min_tmp = []\n", - "dis_max_tmp = []\n", - "for d in tqdm(range(len(distance))):\n", - " dis_min_tmp.append(min(distance[d][np.nonzero(distance[d])]))\n", - " dis_max_tmp.append(max(distance[d][np.nonzero(distance[d])]))\n", - "\n", - "print('range of pair-wise distance is: ', min(dis_min_tmp), '-', max(dis_max_tmp))" - ], + "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -356,18 +310,17 @@ "id": "NLbckdsGG25r", "outputId": "9b9cf3f5-2b48-4fce-e3cf-bbbabc0af71f" }, - "execution_count": 8, "outputs": [ { - "output_type": "stream", "name": "stderr", + "output_type": "stream", "text": [ "100%|██████████| 106127/106127 [01:19<00:00, 1341.27it/s]\n" ] }, { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ "range of a is: 2.296021 - 66.29136774227022\n", "range of b is: 2.258778 - 61.125585795588215\n", @@ -376,83 +329,109 @@ ] }, { - "output_type": "stream", "name": "stderr", + "output_type": "stream", "text": [ "100%|██████████| 106127/106127 [00:12<00:00, 8797.44it/s]" ] }, { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ "range of pair-wise distance is: 0.7249349602879995 - 64.8913973530744\n" ] }, { - "output_type": "stream", "name": "stderr", + "output_type": "stream", "text": [ "\n" ] } + ], + "source": [ + "a = []\n", + "b = []\n", + "c = []\n", + "volume = []\n", + "distance = []\n", + "\n", + "for s in tqdm(results):\n", + " s = s[\"structure\"]\n", + " lattice = s.lattice\n", + " a.append(lattice.a)\n", + " b.append(lattice.b)\n", + " c.append(lattice.c)\n", + " volume.append(lattice.volume)\n", + " distance.append(s.distance_matrix)\n", + "\n", + "print('range of a is: ', min(a), '-', max(a))\n", + "print('range of b is: ', min(b), '-', max(b))\n", + "print('range of c is: ', min(c), '-', max(c))\n", + "print('range of volume is: ', min(volume), '-', max(volume))\n", + "\n", + "dis_min_tmp = []\n", + "dis_max_tmp = []\n", + "for d in tqdm(range(len(distance))):\n", + " dis_min_tmp.append(min(distance[d][np.nonzero(distance[d])]))\n", + " dis_max_tmp.append(max(distance[d][np.nonzero(distance[d])]))\n", + "\n", + "print('range of pair-wise distance is: ', min(dis_min_tmp), '-', max(dis_max_tmp))" ] }, { "cell_type": "markdown", - "source": [ - "## Exploratory Data Analysis" - ], "metadata": { "id": "3Q3zXU8GBXsj" - } + }, + "source": [ + "## Exploratory Data Analysis" + ] }, { "cell_type": "markdown", - "source": [ - "### Setup" - ], "metadata": { "id": "l6VMz2COHGyH" - } + }, + "source": [ + "### Setup" + ] }, { "cell_type": "markdown", - "source": [ - "First, we store the data as a `DataFrame` to make it easier to visualize and apply operations to it." - ], "metadata": { "id": "8iwzfi4oBpuL" - } + }, + "source": [ + "First, we store the data as a `DataFrame` to make it easier to visualize and apply operations to it." + ] }, { "cell_type": "code", - "source": [ - "import plotly.express as px\n", - "df = pd.DataFrame(dict(a=a, b=b, c=c, volume=volume, min_distance=dis_min_tmp, max_distance=dis_max_tmp))" - ], + "execution_count": 9, "metadata": { "id": "s13owEaQHDTb" }, - "execution_count": 9, - "outputs": [] + "outputs": [], + "source": [ + "import plotly.express as px\n", + "df = pd.DataFrame(dict(a=a, b=b, c=c, volume=volume, min_distance=dis_min_tmp, max_distance=dis_max_tmp))" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "KmRBYtl8BvWj" + }, "source": [ "### Min/Max\n", "Next, we take a look at the minimum and maximum for each of the parameters." - ], - "metadata": { - "id": "KmRBYtl8BvWj" - } + ] }, { "cell_type": "code", - "source": [ - "low_df = df.apply(np.min).drop(\"max_distance\")\n", - "low_df" - ], + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -460,10 +439,8 @@ "id": "N6sA96JIEYFv", "outputId": "c49091f2-5df8-4cf7-8876-177000613425" }, - "execution_count": null, "outputs": [ { - "output_type": "execute_result", "data": { "text/plain": [ "a 2.296021\n", @@ -474,16 +451,19 @@ "dtype: float64" ] }, + "execution_count": 45, "metadata": {}, - "execution_count": 45 + "output_type": "execute_result" } + ], + "source": [ + "low_df = df.apply(np.min).drop(\"max_distance\")\n", + "low_df" ] }, { "cell_type": "code", - "source": [ - "df.apply(np.max)" - ], + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -491,10 +471,8 @@ "id": "PGQzywtqEsPB", "outputId": "f81d87c7-adab-4dac-c032-2405812b206b" }, - "execution_count": null, "outputs": [ { - "output_type": "execute_result", "data": { "text/plain": [ "a 66.291368\n", @@ -506,37 +484,38 @@ "dtype: float64" ] }, + "execution_count": 12, "metadata": {}, - "execution_count": 12 + "output_type": "execute_result" } + ], + "source": [ + "df.apply(np.max)" ] }, { "cell_type": "markdown", - "source": [ - "The maxima here can be pretty large, for example ~`20000` cubic angstroms for the unit cell volume." - ], "metadata": { "id": "F_hebBRzDobB" - } + }, + "source": [ + "The maxima here can be pretty large, for example ~`20000` cubic angstroms for the unit cell volume." + ] }, { "cell_type": "markdown", + "metadata": { + "id": "2RrXAFhKDz0h" + }, "source": [ "### Histogram\n", "\n", "Let's take a quick look at one of the parameters involved, in this case the `a` lattice parameter length." - ], - "metadata": { - "id": "2RrXAFhKDz0h" - } + ] }, { "cell_type": "code", - "source": [ - "import plotly.express as px\n", - "px.histogram(df, x=\"a\", marginal=\"rug\")" - ], + "execution_count": 68, "metadata": { "colab": { "base_uri": "https://localhost:8080/", @@ -545,10 +524,8 @@ "id": "DCkjsiaPDI1S", "outputId": "8743280e-00a3-49fe-c1c4-71ae231cc892" }, - "execution_count": 68, "outputs": [ { - "output_type": "display_data", "data": { "text/html": [ "\n", @@ -584,35 +561,37 @@ "" ] }, - "metadata": {} + "metadata": {}, + "output_type": "display_data" } + ], + "source": [ + "import plotly.express as px\n", + "px.histogram(df, x=\"a\", marginal=\"rug\")" ] }, { "cell_type": "markdown", - "source": [ - "Clearly, there are outliers." - ], "metadata": { "id": "hAgMygR_DkKD" - } + }, + "source": [ + "Clearly, there are outliers." + ] }, { "cell_type": "markdown", + "metadata": { + "id": "BGbKbIHcB26d" + }, "source": [ "### Quantile Maximum\n", "Since these are some pretty large ranges that will inflate the round-off error of `xtal2png`, let's see if we can filter some of these further by considering only up to a certain percentile (`q` quantile) for the relevant parameters." - ], - "metadata": { - "id": "BGbKbIHcB26d" - } + ] }, { "cell_type": "code", - "source": [ - "q = 0.99\n", - "df.apply(lambda a: np.quantile(a, 1 - q)).drop(\"max_distance\")" - ], + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -620,10 +599,8 @@ "id": "_a4TxQhYEufY", "outputId": "6c9c7139-064b-4f50-e7d0-cbac39b9e46f" }, - "execution_count": null, "outputs": [ { - "output_type": "execute_result", "data": { "text/plain": [ "a 2.921745\n", @@ -634,18 +611,19 @@ "dtype: float64" ] }, + "execution_count": 57, "metadata": {}, - "execution_count": 57 + "output_type": "execute_result" } + ], + "source": [ + "q = 0.99\n", + "df.apply(lambda a: np.quantile(a, 1 - q)).drop(\"max_distance\")" ] }, { "cell_type": "code", - "source": [ - "upp_df = df.apply(lambda a: np.quantile(a, q))\n", - "upp_df = upp_df.drop(\"min_distance\")\n", - "upp_df" - ], + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -653,10 +631,8 @@ "id": "Nl0UQKC_E2Wg", "outputId": "cbe89356-a72d-4bd0-9426-c086c61718cb" }, - "execution_count": null, "outputs": [ { - "output_type": "execute_result", "data": { "text/plain": [ "a 15.292415\n", @@ -667,35 +643,38 @@ "dtype: float64" ] }, + "execution_count": 58, "metadata": {}, - "execution_count": 58 + "output_type": "execute_result" } + ], + "source": [ + "upp_df = df.apply(lambda a: np.quantile(a, q))\n", + "upp_df = upp_df.drop(\"min_distance\")\n", + "upp_df" ] }, { "cell_type": "markdown", - "source": [ - "### Data Retention" - ], "metadata": { "id": "96azSaZZHRm4" - } + }, + "source": [ + "### Data Retention" + ] }, { "cell_type": "markdown", - "source": [ - "The ranges are a lot more reasonable now. Let's see how many compounds are retained after applying an upper bound filtering step using this upper quantile." - ], "metadata": { "id": "mZpYn3z2CHYr" - } + }, + "source": [ + "The ranges are a lot more reasonable now. Let's see how many compounds are retained after applying an upper bound filtering step using this upper quantile." + ] }, { "cell_type": "code", - "source": [ - "qstr = \" and \".join([f\"{lbl} < @upp_df.{lbl}\" for lbl in upp_df.index]) # .drop([\"volume\", \"max_distance\"])\n", - "qstr" - ], + "execution_count": 64, "metadata": { "colab": { "base_uri": "https://localhost:8080/", @@ -704,29 +683,29 @@ "id": "c67q0_1MFE1X", "outputId": "16a73310-bfab-4b88-bf42-075dd1b4caf6" }, - "execution_count": 64, "outputs": [ { - "output_type": "execute_result", "data": { - "text/plain": [ - "'a < @upp_df.a and b < @upp_df.b and c < @upp_df.c and volume < @upp_df.volume and max_distance < @upp_df.max_distance'" - ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" - } + }, + "text/plain": [ + "'a < @upp_df.a and b < @upp_df.b and c < @upp_df.c and volume < @upp_df.volume and max_distance < @upp_df.max_distance'" + ] }, + "execution_count": 64, "metadata": {}, - "execution_count": 64 + "output_type": "execute_result" } + ], + "source": [ + "qstr = \" and \".join([f\"{lbl} < @upp_df.{lbl}\" for lbl in upp_df.index]) # .drop([\"volume\", \"max_distance\"])\n", + "qstr" ] }, { "cell_type": "code", - "source": [ - "filt_df = df.query(qstr)\n", - "filt_df" - ], + "execution_count": 65, "metadata": { "colab": { "base_uri": "https://localhost:8080/", @@ -735,27 +714,9 @@ "id": "XITwLdHg9GQF", "outputId": "f2bc7508-0749-4f01-afbf-ac4c462a13d1" }, - "execution_count": 65, "outputs": [ { - "output_type": "execute_result", "data": { - "text/plain": [ - " a b c volume min_distance max_distance\n", - "0 5.189676 5.189676 5.189676 58.128751 2.906562 2.970818\n", - "1 5.388181 5.388181 5.388181 65.313995 3.018739 3.085060\n", - "2 3.300603 3.300603 3.300603 25.425237 2.333879 2.333879\n", - "3 3.498199 3.498199 3.498199 30.270418 2.142200 2.142200\n", - "4 3.510234 3.510234 3.510234 43.252200 3.039952 3.039952\n", - "... ... ... ... ... ... ...\n", - "106122 8.466314 8.603384 8.606069 469.512423 1.502355 6.098565\n", - "106123 8.960343 8.960343 8.960342 466.892631 1.533764 6.157343\n", - "106124 6.874616 7.317851 8.159621 363.700541 0.996728 5.331704\n", - "106125 5.211291 7.406056 10.707537 370.569429 0.983919 5.761587\n", - "106126 5.397578 7.469631 10.283887 371.461746 0.978758 5.621589\n", - "\n", - "[102934 rows x 6 columns]" - ], "text/html": [ "\n", "
\n", @@ -966,19 +927,37 @@ "
\n", " \n", " " + ], + "text/plain": [ + " a b c volume min_distance max_distance\n", + "0 5.189676 5.189676 5.189676 58.128751 2.906562 2.970818\n", + "1 5.388181 5.388181 5.388181 65.313995 3.018739 3.085060\n", + "2 3.300603 3.300603 3.300603 25.425237 2.333879 2.333879\n", + "3 3.498199 3.498199 3.498199 30.270418 2.142200 2.142200\n", + "4 3.510234 3.510234 3.510234 43.252200 3.039952 3.039952\n", + "... ... ... ... ... ... ...\n", + "106122 8.466314 8.603384 8.606069 469.512423 1.502355 6.098565\n", + "106123 8.960343 8.960343 8.960342 466.892631 1.533764 6.157343\n", + "106124 6.874616 7.317851 8.159621 363.700541 0.996728 5.331704\n", + "106125 5.211291 7.406056 10.707537 370.569429 0.983919 5.761587\n", + "106126 5.397578 7.469631 10.283887 371.461746 0.978758 5.621589\n", + "\n", + "[102934 rows x 6 columns]" ] }, + "execution_count": 65, "metadata": {}, - "execution_count": 65 + "output_type": "execute_result" } + ], + "source": [ + "filt_df = df.query(qstr)\n", + "filt_df" ] }, { "cell_type": "code", - "source": [ - "frac_retained = filt_df.shape[0] / df.shape[0]\n", - "print(f\"{100*frac_retained:.1f}% retained\")" - ], + "execution_count": 66, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -986,49 +965,50 @@ "id": "RRXlLWGg8cNA", "outputId": "2684b4c8-d897-4eb7-ff24-d974c88f60e0" }, - "execution_count": 66, "outputs": [ { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ "97.0% retained\n" ] } + ], + "source": [ + "frac_retained = filt_df.shape[0] / df.shape[0]\n", + "print(f\"{100*frac_retained:.1f}% retained\")" ] }, { "cell_type": "markdown", - "source": [ - "The ranges are much more reasonable now. Also, we have retained ~97% of the original compounds. The other 3% will be much less likely to be represented during generation (i.e. it's been masked from the distribution), although as outliers to begin with it's unclear if most generative models would generate these kinds of compounds anyway. This may be interesting as a topic of future study." - ], "metadata": { "id": "GNdkAzb_EMdp" - } + }, + "source": [ + "The ranges are much more reasonable now. Also, we have retained ~97% of the original compounds. The other 3% will be much less likely to be represented during generation (i.e. it's been masked from the distribution), although as outliers to begin with it's unclear if most generative models would generate these kinds of compounds anyway. This may be interesting as a topic of future study." + ] }, { "cell_type": "markdown", - "source": [ - "## Selected Parameter Ranges" - ], "metadata": { "id": "FAmu-fvkENJx" - } + }, + "source": [ + "## Selected Parameter Ranges" + ] }, { "cell_type": "markdown", - "source": [ - "We'll leave the lower bound as the minimum of all Materials Project entries (with fewer than 52 sites, that is). Alternatively, the lower bound could be set to `0` for each of these." - ], "metadata": { "id": "mHGi3-_aEPaI" - } + }, + "source": [ + "We'll leave the lower bound as the minimum of all Materials Project entries (with fewer than 52 sites, that is). Alternatively, the lower bound could be set to `0` for each of these." + ] }, { "cell_type": "code", - "source": [ - "low_df # i.e. minima" - ], + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -1036,10 +1016,8 @@ "id": "-Z5PlMX39W7v", "outputId": "acb80851-276b-4761-8b5f-8c38d6d965e1" }, - "execution_count": null, "outputs": [ { - "output_type": "execute_result", "data": { "text/plain": [ "a 2.296021\n", @@ -1050,16 +1028,18 @@ "dtype: float64" ] }, + "execution_count": 62, "metadata": {}, - "execution_count": 62 + "output_type": "execute_result" } + ], + "source": [ + "low_df # i.e. minima" ] }, { "cell_type": "code", - "source": [ - "upp_df # based on `q` quantile" - ], + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -1067,10 +1047,8 @@ "id": "u15GEynV-JkH", "outputId": "4d55da11-a536-423d-db88-d6ff4b95a479" }, - "execution_count": null, "outputs": [ { - "output_type": "execute_result", "data": { "text/plain": [ "a 15.292415\n", @@ -1081,31 +1059,40 @@ "dtype: float64" ] }, + "execution_count": 63, "metadata": {}, - "execution_count": 63 + "output_type": "execute_result" } + ], + "source": [ + "upp_df # based on `q` quantile" ] }, { "cell_type": "markdown", - "source": [ - "## Plotting Histogram Distributions" - ], "metadata": { "id": "E1T3tM-CC2RC" - } + }, + "source": [ + "## Plotting Histogram Distributions" + ] }, { "cell_type": "markdown", - "source": [ - "Let's plot and save the distributions for the parameters in `upp_df`. First, we define some helper functions to make the figures more compatible with academic publishing and to save them." - ], "metadata": { "id": "OH4y7L-wE5Ep" - } + }, + "source": [ + "Let's plot and save the distributions for the parameters in `upp_df`. First, we define some helper functions to make the figures more compatible with academic publishing and to save them." + ] }, { "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "QHY6nYm8Jw-L" + }, + "outputs": [], "source": [ "from typing import Union\n", "import plotly.graph_objs as go\n", @@ -1233,24 +1220,24 @@ " )\n", " fig = matplotlibify(fig, **mpl_kwargs)\n", " fig.write_image(fig_path + \".png\")" - ], - "metadata": { - "id": "QHY6nYm8Jw-L" - }, - "execution_count": 10, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "From here, we just loop through the various parameters, plotting and saving histograms as we go. If running on Google Colab, these will be saved to the current directory which is temporary storage that will be purged after the session is closed." - ], "metadata": { "id": "VAOcpPLhFL24" - } + }, + "source": [ + "From here, we just loop through the various parameters, plotting and saving histograms as we go. If running on Google Colab, these will be saved to the current directory which is temporary storage that will be purged after the session is closed." + ] }, { "cell_type": "code", + "execution_count": 69, + "metadata": { + "id": "8sg0N_hVNpw1" + }, + "outputs": [], "source": [ "figs = []\n", "for lbl in df.columns.drop(\"min_distance\"):\n", @@ -1258,27 +1245,20 @@ " fig = matplotlibify(fig)\n", " figs.append(fig)\n", " plot_and_save(lbl+\"_hist\", fig, show=False)" - ], - "metadata": { - "id": "8sg0N_hVNpw1" - }, - "execution_count": 69, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "Here's an example of what the first figure looks like (compare with the histogram from an earlier section in terms of formatting)." - ], "metadata": { "id": "pv7e-zSoFrAo" - } + }, + "source": [ + "Here's an example of what the first figure looks like (compare with the histogram from an earlier section in terms of formatting)." + ] }, { "cell_type": "code", - "source": [ - "figs[0]" - ], + "execution_count": 70, "metadata": { "colab": { "base_uri": "https://localhost:8080/", @@ -1287,10 +1267,8 @@ "id": "-wXxfRHC7i_a", "outputId": "6c7900a8-bc4e-4a99-cb61-c670373a2714" }, - "execution_count": 70, "outputs": [ { - "output_type": "display_data", "data": { "text/html": [ "\n", @@ -1326,9 +1304,31 @@ "" ] }, - "metadata": {} + "metadata": {}, + "output_type": "display_data" } + ], + "source": [ + "figs[0]" ] } - ] + ], + "metadata": { + "colab": { + "authorship_tag": "ABX9TyOBcM+CPIOs/3QjWt9ZNGeg", + "collapsed_sections": [], + "include_colab_link": true, + "name": "2.0-materials-project-ranges.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 }