This project represents my master thesis in Computer Engineering. It aims to attribute and to contextualize large file stream using a rule based approach.
It supports the most 3 popular attribution rules:
The goal of the project is to provide a complete framework to ingest and attribute malware samples based on the previous rules.
It is based on the following open-source projects for the static and dynamic malware analysis:
It supports various kinds of analysis:
- Static and enrichment with IntelOwl
- Dynamic analysis with CAPEv2 and Elasticsearch Agent
- Retrohunt analys with Mquery and Elasticsearch
The framework architecture is shown in the following picture.
It is composed by several modules:
- Connectors: Python3 scripts that periodically take data from external feeds and upload into Malstream. There is a module for each source (currently MalwareBaazar and Tria.ge are supported).
- Backend: FastAPI framework with Celery for the asynchronous tasks that manage the analysis steps. Celery is based on RabbitMQ, that implements three priority queues to distribute the analysis workload.
- Dynamic Analysis module: custom version of CAPEv2, that supports the analysis of Sigma rules with the Elasticsearch agent installed on the Virtual Machine. It also contains API to synchronize the Suricata and YARA rules present in CAPEv2.
- Static Analysis and Enrichmet module: custom version of IntelOwl, configured to enable only the YARA and hash analysis. It also contains API to synchronize the YARA rules present in IntelOwl.
- Retrohunt Analysis module: custom version of Mquery, with additional endpoint to upload and download malware sample.
- Frontend: developed with ReactJS to manage the rules and the results.
The process to dynamically evaluate the SIGMA rule is shown in the following picture.
First, install the services from the customized repositories. They contain the changes that allows malstream to synchronize the rules inside each analyzers. To install this services, follow the official project manual.
- IntelOwl. The configuration can be found in this project, under
configs/analyzer_config.json
. Only a subset of the exist0ing analyzers are enabled, and a custom analzyer is created. - Mquery.
- CAPEv2. After installing the framework and at least a working virtual machine, configure the snapshot in the following manner:
- Install the Elastic Agent without enable it.
- Install sysmon and use you favorite configuration. A very verbose setting can be found ad
configs/sysmonconfig-export.xml
. - Place the Elastic agent configuration template inside the system installation folder. The configuration ca be found at
configs/elastic-agent.yml
. The CAPEv2 agent will searh the template configuration under%PROGRAMW6432%/Elastic/Agent/elastic-agent.yml
. - Create a snapshot that will be used to run the analysis.
To intall the Malstream frontend, just:
- install
node
(currently tested on v20.1.0) - go in the frontend folder;
- run
npm install
andnpm run start
;
To install the Malstrea backend:
- Install Poetry, currently tested on 1.3.2.
- Install RabbitMQ also from Docker. Currently tested on RabbitMQ 3.11.15.
- Run
poetry install
. - Run the Celery workers. These are the commands with the proper workload distribution.
poetry run celery -A backend worker -c 1 --loglevel=info -Q retrohunt,sandbox -n retrohunt_node
poetry run celery -A backend worker -c 2 --loglevel=info -Q enrichment,sandbox -n enrichment_node
poetry run celery -A backend worker -c 6 --loglevel=info -Q sandbox -n sandbox_node
These configuration are tested on a host with 4 CPUs and 8GB of RAM. 2 VMs are configured in CAPEv2 framework.
- Matteo Corradini ([email protected])
A special thanks to the mantainers of IntelOwl, CAPEv2 and Mquery.