Skip to content

Commit

Permalink
More documentation and minor changes.
Browse files Browse the repository at this point in the history
  • Loading branch information
Bhargav Mangipudi committed Apr 9, 2017
1 parent d75a4f6 commit 08dec8b
Show file tree
Hide file tree
Showing 5 changed files with 80 additions and 12 deletions.
21 changes: 15 additions & 6 deletions saul-core/doc/CONCEPTUALSTRUCTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,19 +37,28 @@ In this definition `pos` is defined to be a property of nodes of type token. The
inside `{ .... }` is the definition of a sensor which given an object of type `ConllRawToken` i.e. the tye of node and
generates an output property value (in this case, using the POS tag of an object of type `ConllRawToken`).

If the content of a property is computationally intensive to compute, you can cache its value, by setting `cache` to be
If the content of a property is computationally intensive to compute, you can cache its Feature Vector, by setting `cacheFeatureVector` to be
`true`:
```scala
val pos = property(token, cache = true) {
val pos = property(token, cacheFeatureVector = true) {
(t: ConllRawToken) => t.POS
}
```

The first time that a property is called with a specific value, it would you remember the corresponding output,
so next time it just looks up the value from the cache.
During the first training iteration, the feature vector is computed and cached during further iterations of training/testing. This value is cached in-memory for the lifetime of the app. Using this feature judiciously and make sure you have enough free memory (RAM) available.

Note that when training, the property cache is remove between two training interation in order not to interrupt
the trainng procedure.
If you want to cache the value of a feature during a single iteration, use the `cache` parameter.

The `cache` parameter allows the value to be cached within a training/testing iteration. This is useful if you one of your features depends on evaluation of a Classifier on other instances as well. This recursive evaluation of the Classifier might be expensive and caching would speed-up performance. Look at a sample usage of this parameter in the [POSTagging Example](../../saul-examples/src/main/scala/edu/illinois/cs/cogcomp/saulexamples/nlp/POSTagger/POSDataModel.scala#L66).

Usage:
```scala
val posWindow = property(token, cache = true) {
(t: ConllRawToken) => t.getNeighbors.map(n => posWindow(n))
}
```

The value of these properties are cleared at the end of each training iteration.

#### Parameterized properties
Suppose you want to define properties which get some parameters; this can be important when we want to programmatically
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,12 @@ abstract class Learnable[T <: AnyRef](val node: Node[T], val parameters: Paramet
def feature: List[Property[T]] = node.properties.toList

/** filter out the label from the features */
def combinedProperties = if (label != null) new CombinedDiscreteProperty[T](this.feature.filterNot(_.name == label.name))
else new CombinedDiscreteProperty[T](this.feature)
def combinedProperties = {
val features = if (label != null) this.feature.filterNot(_.name == label.name) else this.feature

// Support Feature Vector Caching during training.
new CombinedDiscreteProperty[T](features, supportsFeatureVectorCaching = true)
}

def lbpFeatures = Property.convertToClassifier(combinedProperties)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,27 @@ trait DataModel extends Logging {
e
}

/** Helper class to facilitate creating new [[Property]] instances.
*
* Note:
* - The `cache` parameter is used to cache a property sensor's value within a single training iteration. This is
* useful if properties are defined recursively.
* - The `cacheFeatureVector` parameter caches the FeatureVector for instances of this property. Thus, feature
* extraction is performed only once during the training/testing process. This can lead to an increase in RAM
* usage but will lead to speed-up in training iterations. Recommended to use with static features which have high
* feature extraction effort.
*
* @param node [[Node]] instance to add the current property to.
* @param name Name of the property.
* @param cache Boolean indicating if this property sensor's value should be cached within training iterations.
* @param ordered Denoting if the order among the values in this property needs to be preserved. Only applies to
* collection based properties.
* @param cacheFeatureVector Boolean indicating if this property's feature vector should be cached during
* training/testing. Caching feature vector saves redundant feature extraction during
* training/testing.
* @tparam T Data type of the node that this property is associated with.
* @return [[PropertyApply]] instance
*/
class PropertyApply[T <: AnyRef] private[DataModel] (
val node: Node[T],
name: String,
Expand Down Expand Up @@ -290,6 +311,27 @@ trait DataModel extends Logging {
}
}

/** Function to create a new [[Property]] instance inside a DataModel.
*
* Note:
* - The `cache` parameter is used to cache a property sensor's value within a single training iteration. This is
* useful if properties are defined recursively.
* - The `cacheFeatureVector` parameter caches the FeatureVector for instances of this property. Thus, feature
* extraction is performed only once during the training/testing process. This can lead to an increase in RAM
* usage but will lead to speed-up in training iterations. Recommended to use with static features which have high
* feature extraction effort.
*
* @param node [[Node]] instance to add the current property to.
* @param name Name of the property.
* @param cache Boolean indicating if this property sensor's value should be cached within training iterations.
* @param ordered Denoting if the order among the values in this property needs to be preserved. Only applies to
* collection based properties.
* @param cacheFeatureVector Boolean indicating if this property's feature vector should be cached during
* training/testing. Caching feature vector saves redundant feature extraction during
* training/testing.
* @tparam T Data type of the node that this property is associated with.
* @return Property instance wrapped in a helper class [[PropertyApply]]
*/
def property[T <: AnyRef](node: Node[T], name: String = "prop" + properties.size, cache: Boolean = false, ordered: Boolean = false, cacheFeatureVector: Boolean = false) =
new PropertyApply[T](node, name, cache, ordered, cacheFeatureVector)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,16 @@ class Node[T <: AnyRef](val keyFunc: T => Any = (x: T) => x, val tag: ClassTag[T
}

def clear(): Unit = {
// Clear property caches
propertyCacheList.foreach(_.clear())
propertyFeatureVectorCache.clear()

collection.clear
trainingSet.clear
testingSet.clear
for (e <- incoming) e.clear
for (e <- outgoing) e.clear

for (e <- incoming) e.clear()
for (e <- outgoing) e.clear()
}

private var count: AtomicInteger = new AtomicInteger()
Expand Down Expand Up @@ -312,6 +317,7 @@ class Node[T <: AnyRef](val keyFunc: T => Any = (x: T) => x, val tag: ClassTag[T
}
}

/** WeakHashMap instance to cache property's [[FeatureVector]] instances during training/testing */
private[saul] final val propertyFeatureVectorCache = new mutable.WeakHashMap[T, mutable.HashMap[Property[_], FeatureVector]]()

/** list of hashmaps used inside properties for caching sensor values */
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,14 @@ import java.util
import scala.collection.mutable
import scala.reflect.ClassTag

case class CombinedDiscreteProperty[T <: AnyRef](atts: List[Property[T]])(implicit val tag: ClassTag[T]) extends TypedProperty[T, List[_]] {
/** Represents a collection of properties.
*
* @param atts List of properties (attributes).
* @param supportsFeatureVectorCaching Boolean to denote if feature vector caching should be supported.
* @param tag ClassTag of the property's input type.
* @tparam T Property's input type.
*/
case class CombinedDiscreteProperty[T <: AnyRef](atts: List[Property[T]], supportsFeatureVectorCaching: Boolean = false)(implicit val tag: ClassTag[T]) extends TypedProperty[T, List[_]] {

override val sensor: (T) => List[_] = {
t: T => atts.map(att => att.sensor(t))
Expand All @@ -31,7 +38,7 @@ case class CombinedDiscreteProperty[T <: AnyRef](atts: List[Property[T]])(implic
atts.foreach(property => {
val extractedFeatureVector = {
// Handle caching of Feature Vector
if (property.cacheFeatureVector && property.isInstanceOf[NodeProperty[T]]) {
if (supportsFeatureVectorCaching && property.cacheFeatureVector && property.isInstanceOf[NodeProperty[T]]) {
val nodeProperty = property.asInstanceOf[NodeProperty[T]]
val instanceCacheMap = nodeProperty.node.propertyFeatureVectorCache
.getOrElseUpdate(instance, new mutable.HashMap[Property[_], FeatureVector]())
Expand Down

0 comments on commit 08dec8b

Please sign in to comment.