-
Notifications
You must be signed in to change notification settings - Fork 14
The AtomFeat type
Rachel Kurchin edited this page Oct 15, 2020
·
2 revisions
This type stores featurization metadata and is carried along with feature vectors/matrices to ensure that encoded features (i.e. a bunch of 0's and 1's) are always "decodable" to human-understandable quantities.
-
name::Symbol
: the name of the feature (e.g. "Atomic Mass") -
categorical::Bool
: A flag that indicates whether the feature is categorical (e.g. block, which can be s, p, d, or f) or continuous (e.g. atomic radius, which can take on a continuous range of values) -
num_bins::Integer
: Length of the associated feature vector -
logspaced::Bool
: Flag storing whether values are logarithmically or linearly spaced (generally only relevant for numerical features, debating whether this is necessary given that values are explicitly stored, and we could conceivably allow arbitrary spacing anyway, but it's here for now and could be useful for e.g. plotting things later on) -
vals::Vector
: List of values; length is equal tonum_bins
for categorical features, ornum_bins+1
for numerical ones, because they specify the edges of the bins for theone-hot
encoding
- there are several options for constructors where certain fields will be automatically populated/inferred based on inputs
-
build_atom_feats
: Build a featurization scheme (Array of AtomFeat objects) given vectors of metadata. Only required input is a vector of Symbols specifying which features to pull from the database (stored inatom_data_df
) but takes arguments to customize behavior further. -
make_feature_vectors
: Builds feature vectors associated with each atom for which all requested features are available. Has two call signatures, one where a featurization is supplied, and one that builds the featurization from the inputs tobuild_atom_feats
by taking that function's arguments as inputs. -
decode_feature_vector(vec::Vector, featurization::Vector{AtomFeat})
: given an encoded vector and associated featurization scheme, decode it and return a dictionary from feature name to values (for categorical features) or ranges of values (for continuous features)