Skip to content

Latest commit

 

History

History
169 lines (133 loc) · 7.58 KB

TGen-Markov-Models.md

File metadata and controls

169 lines (133 loc) · 7.58 KB

Markov Models

TGen supports the use of Markov models to allow the user to control how TCP streams are created. TGen uses Markov models for three distinct processes:

  • In a traffic action, a Markov model can be used to configure the flow creation process, i.e., the frequency with which new flows should be created. The Markov model specifies inter-flow delay distributions. This model is configured with the flowmodelpath and markovmodelseed attributes on the traffic action.
  • In a flow action, a Markov model can be used to configure the stream creation process, i.e., the frequency with which new TCP streams should be created. The Markov model specifies inter-stream delay distributions. This model is configured with the streammodelpath and markovmodelseed attributes on the flow action. It can also be configured on a traffic action in order to apply the model to all flows generated by the action.
  • In a stream action, a Markov model can be used to configure the packet creation process on the associated TCP stream, i.e., the frequency with which packets should be created. The Markov model specifies inter-packet delay distributions. This model is configured with the packetmodelpath and markovmodelseed attributes on the stream action. It can also be configured on traffic or flow actions in order to apply the model to all TCP streams generated the associated flows.

More information about how to set up a Markov model in your TGen configuration file can be found in the doc/TGen-Options.md file.

The remainder of this document explains the Markov model file format that TGen supports, and provides examples of how to generate Markov models that will pass TGen's Markov model validation.

File Format and Structure

As with the config file, TGen uses the graphml file format to represent Markov models. As we explain the structure supported by TGen, we provide examples of generating the corresponding graphml elements and atrributes using python3 and the networkx python3 module (installing the TGenTools toolkit will install the networkx module).

Models are constructed as directed graphs:

G = networkx.DiGraph()

Generally, the Markov model specifies a set of Markov model states and a set of transitions between pairs of states. Each state is also associated with a set of observations, and a set of emissions between states and observations.

Vertices: Markov model states and observations

Vertices in the graph can either be Markov model "states" or "observations", and the type is encoded in the graph using the type attribute on the graph node. Each vertex must specify a type and a name. The vertex id is required and must be unique but are otherwise not used by TGen.

G.add_node('s0', type="state", name='start')
G.add_node('s1', type="state", name='anything_you_want')

The graph must contain one and only one vertex of type state whose name is start. This instructs TGen in which state the Markov model begins. The name of the other vertices of type state are insignificant and can be set to any string.

G.add_node('o1', type="observation", name='+')
G.add_node('o2', type="observation", name='-')

Vertices of type observation must set one of the following as the name, which encodes the action to be taken upon reaching a particular vertex. Valid name strings are:

  • +: Generate a packet from client to server (for packet models) or a new stream (for stream models)
  • -: Generate a packet from server to client (for packet models) or a new stream (for stream models)
  • F: Stop generating new packets (for packet models) or new streams (for stream models)

For stream models, there is no difference between + and -: you can simply use + to indicate new stream creation on stream models.

Edges: Markov model state transitions and emissions

Edges in the graph can either be Markov model "transitions" or "emissions", and the type is encoded in the graph using the type attribute on the graph edge. Each edge must specify a type and a weight. The source and target vertex id must match those that were defined when creating the vertices.

G.add_edge('s0', 's1', type='transition', weight=1.0)
G.add_edge('s1', 's1', type='transition', weight=1.0)

Edges of type transition instruct TGen how to move between pairs of vertices of type state. For vertices of type state with multiple outgoing edges of type transition, TGen randomly selects one outgoing edge according to the weighted probabilities (each edge's probability is computed by dividing its weight by the sum of the weights of all outgoing transition edges from the same state vertex).

G.add_edge('s1', 'o1', type='emission', weight=0.5, distribution='normal', param_location=5000000, param_scale=1000000)
G.add_edge('s1', 'o2', type='emission', weight=0.5, distribution='exponential', param_rate=0.001)

Whenever TGen moves between states, there is an associated event observation. Edges of type emission instruct TGen how to move between vertices of type state and vertices of type observation. For vertices of type state with multiple outgoing edges of type emission, TGen randomly selects one outgoing edge according to the weighted probabilities (each edge's probability is computed by dividing its weight by the sum of the weights of all outgoing emission edges from the same state vertex).

Once an emission edge has been selected, the observation vertex connected by the edge instructs TGen which type of action to take. Additionally, each emission edge must specify the distribution attribute and the associated parameters for that distribution. These distributions encode the time delay in microseconds that TGen should create after the observation before transitioning to the next state.

The following delay distributions are currently supported (more can be added as the need arises):

  • uniform: a uniform distribution requires the attributes param_low (a) and param_high (b) such that a <= b to generate values uniformly in the range [a, b]
  • normal: a normal distribution requires the attributes param_location (mu) and param_scale (sigma)
  • lognormal: a lognormal distribution requires the attributes param_location (mu) and param_scale (sigma)
  • exponential: an exponential distribution requires the attribute param_rate (lamda)
  • pareto: a Pareto distribution requires the attributes param_scale (xm) and param_shape (alpha)

A final graph can be written to a file using:

networkx.write_graphml(G, 'sample.mmodel.graphml')

Examples

Below is a full example of code provided above. Another example script, which we use to generate our internal default packet and stream models, can be found in the repository at tools/scripts/generate_mmodel_graphml.py.

G = networkx.DiGraph()

G.add_node('s0', type="state", name='start')
G.add_node('s1', type="state", name='anything_you_want')

G.add_node('o1', type="observation", name='+')
G.add_node('o2', type="observation", name='-')

G.add_edge('s0', 's1', type='transition', weight=1.0)
G.add_edge('s1', 's1', type='transition', weight=1.0)

G.add_edge('s1', 'o1', type='emission', weight=0.5, distribution='normal', param_location=5000000, param_scale=1000000)
G.add_edge('s1', 'o2', type='emission', weight=0.5, distribution='exponential', param_rate=0.001)

networkx.write_graphml(G, 'sample.mmodel.graphml')