Project 2

the training and testing data : There are three file:wireshark_preprocess, main, random_gen_testcase.


The user behavior has three log file (wireshark,sysmon,and security), we preprocessing the wireshark log by 'wireshark_preprocess' file and extract the IP information into "Person_X_IP.txt" data.


After preprocessing, run the main file , can get the predict like below

testcase 1: person 1
testcase 2: person 2 


  • testing_dir ='Example test' (if need to test the generate data, change this to 'Test')
  • testing_num= test_num_person=2 (to identify how many test data need to predict)
  • train_num_person=6 (to identify how many test data)

Generate test data

In order to test more, we write a 'random_gen_testcase', which can generate the three file need in and no need to preprocessing because it generate the 'Person_X_IP.txt' file.


  • sub_length=0.2 (to extract how many portion to be testing data from the original training data)

Python environment

Can run sucessfully in this enviroment

