This work aims to introduce new features based on a higher level of abstraction of network traffic, called Flow Aggregation features. In this repository, two features are extracted: Number of flows, Source ports delta.
To be updated.
Argument | Usage | Default | Values and Notes |
---|---|---|---|
--normal_path | CICIDS2017 Dataset Monday File Path | csv_files/biflow_Monday-WorkingHours_Fixed_Hour_0.csv | |
--attack_paths | CICIDS2017 Dataset Attack Paths | csv_files/biflow_Wednesday-WorkingHours_Slowhttptest.csv | Comma separated |
--drop_aggregation | Whether or not to drop flow aggregation features | 0 | 0/1 |
--slice_normal | Whether or not to use a portion of normal file | 0 | 0/1 |
--slice_attacks | Whether or not to use portion(s) of attack file(s) | 0 | 0/1 (comma separated) |
--slice_normal_percent | Percentage of the portion to use from normal file | 0 | |
--slice_attacks_percent | Percentage of the portion(s) to use from attack file(s) | 0 | Comma separated |
--normal_slice_no | Slice number to use when slicing the normal file | 0 | |
--slice_attacks_number | Slice number(s) to use when slicing the attack files | 0 | Comma separated |
--slice_attacks_number | Slice number(s) to use when slicing the attack files | 0 | Comma separated |
--number_of_features | Number of RFE features to print | 5 | Only used in print_features script |
--output | The output file name | result.csv | |
--choose_features | Whether or not to run RFE | 0 | 0/1 (check selected_features argument) |
--selected_features | The features to use during training | 'fwd_mean_pkt_len, bwd_mean_pkt_len, fwd_min_pkt_len, bwd_min_pkt_len, fwd_max_pkt_len,num_src_flows, src_ip_dst_prt_delta' |
Clone this repository.
run pcap_parser.py [pcap file path]
run print_features [specify the parameters as required]
run flow_aggregation.py [specify the parameters as required]
- The output of pcap_parser will be saved as 'pcap_file_name.csv'.
- Print features script will display the RFE ranked features in order (slicing in parameters is only used if set, default is to use the whole files).
- Flow aggregation script will output the classification accuracy and the confusion matrices for a 5-fold cross validation.