-
Notifications
You must be signed in to change notification settings - Fork 20
Comparator
The MAEC Comparator is a python-maec API (formerly implemented as a separate utility), currently implemented in the Bundle module (bundle/bundle.py) that permits for some basic comparisons to be made between two or more MAEC Bundles. Currently this works at the Object-level (including those embedded as Associated Objects in Actions), but we plan on adding support for comparison between other MAEC entities in the near future.
Currently the comparator allows for the following comparisons to be made:
- Find all unique Objects in one or more MAEC Bundles
- Find all common* Objects between two or more MAEC Bundles
*By common, we mean those in different Bundles that are of the same type with certain matching properties (the set on which to match can be specified - see below), while ignoring those that are not relevant for such a comparison (e.g., ids and descriptions). For example, two File Objects in two different Bundles with the same File_Path would be considered common.
For finding common Objects in two more MAEC Bundles, it is important to be able to control the set of properties that one cares to match on, especially when dealing with some of the more complicated Objects. As such, we've created a relatively simple way to define the set of Objects and their respective properties to match on. The syntax is the following Python dictionary:
match_on = {'RootObjectComplexType':['matching_element_name', 'other_matching_element_name']}
That is, for each type of Object one wishes to match on, one must simply specify name of the Object's root complex type (e.g. FileObjectType), along with a list of element names that you want to use in the matching. Note that this list effectively constitutes an AND, as every element specified must match for the Object type for a successful match to be performed.
For example, if you wish to match only on File Objects that have the same File_Name and File_Path, one would create the following dictionary:
match_on = {"FileObjectType": ["file_name", "file_path"]}
Elements that may embedded in other element hierarchies may be specified by writing out the path to the element starting from the root of the object, using '.' as a separator between the different layers of element names. For example, to match on the Path element in an Image_Info section of a Process Object, one would use:
match_on = {"ProcessObjectType": ["image_info.path"]}
Accordingly, for matching on multiple embedded elements in the same path, simply use a '/' between each of the element names. For example, to match on the Path and Command_Line elements in an Image_Info section of a Process Object, one would use:
match_on = {"ProcessObjectType": ["image_info.path/command_line"]}
Finally, in the case of embedded list-based elements, use the '.' notation as above, but do not include the element used to signify a list entry. For example, to match on the Data element in the list of Values contained in a Registry Object, instead of "values.value.data", one would use:
match_on = {"WindowsRegistryKeyObjectType": ["values.data"]}
If one does not wish to specify the Objects and their properties that they wish to match on, the API includes a default dictionary for this purpose, which includes some commonly observed Objects and some of their relevant properties. Currently this dictionary is the following:
match_on = {"FileObjectType": ["file_name", "file_path"],
"WindowsRegistryKeyObjectType": ["hive","key"],
"WindowsMutexObjectType": ["name"],
"SocketObjectType": ["address_value", "port_value"],
"WindowsPipeObjectType": ["name"],
"ProcessObjectType": ["name"]}
The three input parameters to the comparator class are:
- A list of Bundles (specifically, python-maec bundle.Bundle instances) to be compared.
- An optional dictionary describing the Objects and their elements to match on (as described above). If not specified, the default dictionary described in the previous section will be used.
- An optional Boolean for specifying whether to perform case sensitive matches. If not specified, it defaults to True, making all matches case sensitive.
To instantiate and use the MAEC comparator, simply import the python-maec MAEC Bundle class from the bundle module and call the 'compare' Bundle class method:
from maec.bundle.bundle import Bundle
comparison_results = Bundle.compare(bundle_list, match_on, case_sensitive)
This will perform both the unique (for each Bundle in the bundle_list) and common (between all of the Bundles in the bundle_list) comparisons and return a ComparisonResults object, described in the next section.
Calling compare() on the Bundle returns a ComparisonResults object that contains the results of both the unique and intersecting comparisons. Accordingly, this object has two methods:
-
get_common()
: Returns a list of the Objects common to all Bundles. Each common Object is captured in a dictionary with the following structure, which captures the matching properties of the Object, along with the instance(s) (since there is the possibility that more than one Object may match) of the Object found in each Bundle. The instances are represented as a separate dictionary, with the keys representing the IDs of the Bundles, and values representing the IDs of the matching Objects:
common_objects = [{"object" : "<matching object properties>",
"object_instances" : {"bundle_id_1" : ["matching_object_id_1", "matching_object_id_n"...]
"bundle_id_n : ...}}]
-
get_unique()
: Returns a dictionary of the Objects unique to each Bundle. The dictionary has the following structure, with the keys representing the IDs of the Bundles, and values representing the IDs of the Objects unique to their respective Bundles:
unique_objects = {"bundle_id_1" : ["unique_object_id_1", "unique_object_id_n"...]
"bundle_id_n : ...}}
The following is a sample code snippet that reads in two MAEC Bundle instances (assumed to be XML files on disk), performs the comparison between them using a simple match_on dictionary, and then pretty prints the common and unique Objects found.
import pprint
import maec.bindings.maec_bundle as maec_bundle_binding
from maec.bundle.bundle import Bundle
# Matching properties dictionary
match_on = {'FileObjectType': ['file_name'],
'WindowsRegistryKeyObjectType': ['hive', 'key']}
# Parse in the input Bundle documents and create their python-maec Bundle class representations
bundle1 = Bundle.from_obj(maec_bundle_binding.parse("bundle1.xml"))
bundle2 = Bundle.from_obj(maec_bundle_binding.parse("bundle2.xml"))
# Perform the comparison and get the results
comparison_results = Bundle.compare([bundle1, bundle2], match_on)
# Pretty print the common and unique Objects
print "******Common Objects:*******\n"
pprint.pprint(comparison_results.get_common())
print "****************************"
print "******Unique Objects:*******\n"
pprint.pprint(comparison_results.get_unique())
print "****************************"
To try a real example, please see the comparator example (which also includes some real Bundle files for testing) in our examples directory: https://github.com/MAECProject/python-maec/blob/master/examples/comparator_example.py
At the moment the Comparator is very much an ALPHA capability but we hope it will be useful for those trying to perform basic analytics with MAEC data. Some of the things we wish to do in the future are:
- Add support for matching on Actions and Behaviors
- Expand and refine the comparator and comparison results interface
- Add support for dealing with embedded Bundles (e.g. in Packages)
- Expand support for defining how matches are performed
- Customizable logic (e.g. 'OR' instead of 'AND')
- Fuzzy matching