This repository contains two Jupyter notebooks designed to work with Paleo crustal thickness data. One script focuses on machine learning-based predictions of crustal thickness, and the other focuses on visualizing its spatial and temporal evolution.
- Requirements
- Installation
- Usage
- Paleo_Crustal_Thickness-prediction.ipynb
- Spatial_Temporal evolution of the Paleo Crustal Thickness.ipynb
- Data Requirements
- Output Files
- Troubleshooting
- License
To successfully run these notebooks, you need to have Python 3.7 or above installed on your system.
Required Python Packages
- Python 3.7+
- Jupyter Notebook (for running the notebooks interactively)
- pandas
- numpy
- matplotlib
- catboost (for Paleo_Crustal_Thickness-prediction.ipynb)
- scikit-learn (for additional model evaluation if needed)
To install these dependencies, you can create a virtual environment and install the required packages using the following commands:
- python -m venv crustal_env source crustal_env/bin/activate # On Windows, use crustal_env\Scripts\activate
- pip install pandas numpy matplotlib catboost scikit-learn jupyter
If you don't have Jupyter installed globally, you can install it within the environment and launch it with:
- pip install jupyter
- jupyter notebook
- The notebooks require datasets in CSV format. Ensure that your data includes the relevant columns, such as geochemical elements and crustal thickness measurements.
- Paleo_Crustal_Thickness-prediction.ipynb:
- Spatial_Temporal evolution of the Paleo Crustal Thickness.ipynb:
This notebook builds a machine learning model (CatBoost) to predict Paleo crustal thickness using geochemical element data.
This notebook visualizes the spatial and temporal correlation between geologic age, crustal thickness, latitude, and longitude using scatter plots.
This notebook is designed to predict Paleo crustal thickness from geochemical measurements using a machine learning approach. Follow the steps below to run the notebook successfully:
Ensure you have a CSV file with whole-rock geochemical data for paleo-crustal thickness estimation. The dataset should include various major and trace elements (such as SiO₂, TiO₂, Al₂O₃, etc.) with proper order (see Model 1 dataset) as input features and a column labeled Crustal_Thickness for the target variable. If you have only limited geochemical elements, ensuring at least Sr/Y, (La/Yb)n, Rb/Sr, Lu/Hf, Nd/Y, Th/Yb, Ba/V, A/CaO, Cr/Sc, Cr/V, MgO, CaO, K2O, P2O5, Al2O3, MnO, Ho, Lu, Sr, Y, Rb, Ba, Nb, Pb, Sc, V, Ni, A, Nb/Yb, Zr/Y, La/Sm, Dy/Yb, and Sm/Yb are prepared for the LASSO-CV-based estimation model.
In the notebook, update the dataFile path to point to your dataset. Example: dataFile = '/path/to/your/dataset.csv'
- Load the dataset and check for missing values (NaN).
- Extract features (geochemical elements) and the target (crustal thickness).
- Train a CatBoost model to predict crustal thickness based on the geochemical inputs.
You can modify the machine learning workflow, such as performing hyperparameter tuning or cross-validation to optimize the model further.
Once the model is trained, it can be used to predict crustal thickness for new or unseen geochemical data. You can save the trained model and use it in future predictions.
This notebook focuses on visualizing the spatial and temporal evolution of the paleo-crustal thickness. It generates scatter plots showing the relationship between geographic coordinates (latitude and longitude), geologic age, and median crustal thickness.
Steps to Use:
Ensure that your dataset includes columns for Age, Longitude, Latitude, and Median_Crustal_Thickness.
Predicted_Crustal_Thickness, Age, Age error, Lat, Lon
Update the file path in the notebook to point to your dataset. dataFile = '/path/to/your/dataset.csv'
The notebook will:
You can adjust the axes limits, color schemes, and minor tick locators depending on your preferences.
The script saves the resulting plots as a PDF. Make sure to specify your desired file path in the code where the PDF will be saved.
To run the notebooks:
- Clone this repository or download the notebooks.
- Navigate to the directory where the notebooks are stored.
- Launch Jupyter Notebook:
- Open either the Paleo_Crustal_Thickness-prediction.ipynb or Spatial_Temporal evolution of the Paleo Crustal Thickness.ipynb notebook and run the cells in order.
- Paleo_Crustal_Thickness-prediction.ipynb: The output is a machine learning model trained to predict crustal thickness based on geochemical data. You can use this model to make predictions on new data.
- Spatial_Temporal evolution of the Paleo Crustal Thickness.ipynb: The output is a PDF file containing the scatter plots visualizing the spatial-temporal correlation of crustal thickness.
Ensure that you update the file paths for your datasets and output files as needed in each notebook.
- Ensure that your data is correctly formatted and that there are no missing values in critical columns.
- Check that the Python environment has the correct versions of the required libraries installed.
- If you encounter memory issues while processing large datasets, consider running the notebooks on a machine with more RAM or splitting the dataset into smaller chunks.