Skip to content

Latest commit

 

History

History
178 lines (128 loc) · 8.67 KB

README.md

File metadata and controls

178 lines (128 loc) · 8.67 KB

📦 CoastTrain Metadata Plots

Last Commit Maintenance Wiki GitHub Wiki

Python

🌟 Highlights

  • jupyter notebooks for visualizing Coast Train data and metadata
  • scripts for computing inter-labeler agreement and make montage figures
  • works with Doodler, and Segmentation Gym
  • models trained using some Coast Train version 1 data sets are included in Segmentation Zoo

ℹ️ Overview

This repository contains jupyter notebooks and python scripts to create the analyses and plots in Buscombe et al. (in prep) "'Coast Train', a 1.2 Billion Pixel Human-Labeled Dataset for Data-Driven Classification of Coastal Environments", in review.

Note that this repository contains code only to recreate the plots in the aforementioned paper, and also to provide a programmatic way to query and search the dataset for custom applications. For details about how to access the Coast Train version 1 data themselves, please refer to the Coast Train website which contains details about where to download and how to unpack the data using companion program Doodler

✍️ Authors

Package maintainers:

⬇️ Installation

Download

git clone --depth 1 https://github.com/dbuscombe-usgs/CoastTrainMetaPlots.git

Conda environment

In the terminal:

conda env create --file env/coasttrain.yml

when it is installed (may take a while), you can activate it like this:

conda activate coasttrain

Doodler conda environment

We also advise creating the Doodler conda environment to run the programs. See the installation instructions

🚀 Usage

Metadata

The metadata files are the same as those provided in the official data release but are included here for convenience. Please refer to the Coast Train website which contains details about where to download and how to unpack the data using companion program Doodler. The csv files containing the following fields

Variable Description
‘annotation_image_filename’ npz format file containing the label data archive
‘classes_array ‘ names of possible classes in this dataset
‘classes_integer‘ one integer per element in ‘classes_array’
‘classes_present_integer’ Image used by the Doodler program. This is the first 3 bands of ‘orig_image’
‘classes_present_array’ one integer per element in ‘classes_present_array’
‘pen_width’ final width in pixels of pen used to annotate
‘CRF_theta’, ‘CRF_mu’ , ‘CRF_downsample_factor’, ‘Classifier_downsample_factor’, ‘prob_of_unary_potential’, ‘num_of_scales’ internal classifier hyperparameters used by the Doodler program.
‘num_classes’ number of possible classes in this dataset
‘doodle_spatial_density’ proportion of the image annotated
‘acc_georef’ accuracy in meters of the specification of ‘XMin, XMax ‘ and ‘YMin , YMax’
‘epsg’ EPSG code of the projected coordinate system ‘CRS’
‘year , month, day’ time variables
‘hour, minute, second‘ time variables
‘XMin, XMax ‘ minimum and maximum Easting of image footprint
‘YMin , YMax’ minimum and maximum Northing of image footprint
‘LonMin, LonMax’ minimum and maximum Longitude (WGS84) of image footprint
‘LatMin. LatMax’ minimum and maximum Latitude (WGS84) of image footprint
‘CRS’ the projected coordinate system description relating to ‘XMin, XMax ‘ and ‘YMin , YMax’
‘px_size_m’ horizontal size of pixel in meters
‘ImageHeightPx’ , ‘ImageWidthPx’, ‘ImageBands’ Number of pixels in horizontal dimensions X and Y, and the number of bands (always 3)

Notebooks

Notebooks that read metadata files in the metadata folder can be run by launching a jupyter server in your terminal

cd notebooks
jupyter notebook

plot_class_distribution.ipynb

Allows analysis of the class-image distributions for each data record in turn and overall. Generates the following plots:

  • plots/NumLabel_all_datarecords_per_superlabel.png
  • plots/Num_images_per_datarecord_containing_superclass.png
  • plots/Num_images_per_datarecord_containing_class.png

plot_geographic_distribution.ipynb

Allows analysis of the geographic-image distributions for each data record in turn and overall. Generates the following plots:

  • plots/Map_satellite_imagery_folium.png
  • plots/All_imagery_by_lat_and_lon.png

plot_user_distribution.ipynb

Allows analysis of the anonymized labeler-image distributions for each data record in turn and overall. Generates the following plots:

  • plots/Label_all_million_pixels_datarecords_per_ID.png
  • plots/Label_per_datarecord_per_ID.png
  • plots/Label_all_datarecords_per_ID.png
  • plots/Million_pixels_vs_percentage_doodled.png
  • plots/agreement_stats_coasttrain_naip_s2.png

plot_image_locations.ipynb

This notebook simply allows you to visualize where each image is located on a map, one by one

Scripts

Scripts for computing inter-labeler agreement and make montage figures are run from the command line and require modification to point the paths to the locations where you have downloaded the Coast Train npz files to on your local filesystem.

labeler_agreement.py

cd scripts 
python labeler_agreement.py

generates the following plots

  • script_plots/agreement_stats_coasttrain_naip_s2.png
  • script_plots/agreement_stats_coasttrain_naip_s2_IOU.png

plot_montage.py

cd scripts 
python plot_montage.py

Produces a montage of example imagery, labels, and overlay masks for each of the datasets, generating the following figures

  • script_plots/example_coasttrain_naip.png
  • script_plots/example_coasttrain_naip6class.png
  • script_plots/example_coasttrain_quads.png
  • script_plots/example_coasttrain_madeira.png
  • script_plots/example_coasttrain_dauphin.png
  • script_plots/example_coasttrain_sandwich.png
  • script_plots/example_coasttrain_s2.png
  • script_plots/example_coasttrain_s2_4class.png
  • script_plots/example_coasttrain_l8.png
  • script_plots/example_coasttrain_l8elwha.png

plot_montage_remapped.py

cd scripts 
python plot_montage_remapped.py

Produces a montage of example imagery, labels, and overlay masks for each of the datasets remapped into 7 superclasses, generating the following figures

  • script_plots/example_coasttrain_naip_remapped.png
  • script_plots/example_coasttrain_naip6class_remapped.png
  • script_plots/example_coasttrain_quads_remapped.png
  • script_plots/example_coasttrain_madeira_remapped.png
  • script_plots/example_coasttrain_dauphin_remapped.png
  • script_plots/example_coasttrain_s2_remapped.png
  • script_plots/example_coasttrain_s2_4class_remapped.png
  • script_plots/example_coasttrain_l8elwha_remapped.png
  • script_plots/example_coasttrain_l8_remapped.png
  • script_plots/example_coasttrain_sandwich_remapped.png

🎣 Generating remapped classes

You may use the Doodler utility gen_remapped_images_and_labels.py with the provided config .json format files in the remap_config_files folder. When prompted Select file containing super class names and class aliases, select one of the provided config (json) files. For each of the npz files in the dataset that the config file describes, the program will create remapped label images with the suffix _remap_label.jpg, as well as the images themselves and semi-transparent overlays showing the colorized mask.