Skip to content

The AtomFeat type

Rachel Kurchin edited this page Oct 15, 2020 · 2 revisions

This type stores featurization metadata and is carried along with feature vectors/matrices to ensure that encoded features (i.e. a bunch of 0's and 1's) are always "decodable" to human-understandable quantities.

Fields

  • name::Symbol: the name of the feature (e.g. "Atomic Mass")
  • categorical::Bool: A flag that indicates whether the feature is categorical (e.g. block, which can be s, p, d, or f) or continuous (e.g. atomic radius, which can take on a continuous range of values)
  • num_bins::Integer: Length of the associated feature vector
  • logspaced::Bool: Flag storing whether values are logarithmically or linearly spaced (generally only relevant for numerical features, debating whether this is necessary given that values are explicitly stored, and we could conceivably allow arbitrary spacing anyway, but it's here for now and could be useful for e.g. plotting things later on)
  • vals::Vector: List of values; length is equal to num_bins for categorical features, or num_bins+1 for numerical ones, because they specify the edges of the bins for the one-hot encoding

Notable Functions

  • there are several options for constructors where certain fields will be automatically populated/inferred based on inputs
  • build_atom_feats: Build a featurization scheme (Array of AtomFeat objects) given vectors of metadata. Only required input is a vector of Symbols specifying which features to pull from the database (stored in atom_data_df) but takes arguments to customize behavior further.
  • make_feature_vectors: Builds feature vectors associated with each atom for which all requested features are available. Has two call signatures, one where a featurization is supplied, and one that builds the featurization from the inputs to build_atom_feats by taking that function's arguments as inputs.
  • decode_feature_vector(vec::Vector, featurization::Vector{AtomFeat}): given an encoded vector and associated featurization scheme, decode it and return a dictionary from feature name to values (for categorical features) or ranges of values (for continuous features)
Clone this wiki locally