Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset #2

Open
s-sabareeswaran opened this issue Jan 15, 2019 · 8 comments
Open

Dataset #2

s-sabareeswaran opened this issue Jan 15, 2019 · 8 comments

Comments

@s-sabareeswaran
Copy link

Please upload your data set

@ZaydH
Copy link

ZaydH commented Jan 16, 2019

Looking more closely at the repository, it appears that data.npz has the dataset. It has 441 benign samples and 1,368 malware files. The difference between data.npz and data1.npz is not clear to me. README.md states "I have used 3000 malware samples and 1500 benign samples for trainning and testing(will expand further)." I could not find that set however.

@s-sabareeswaran
Copy link
Author

hey zaydH can i contact you through skype or zoom to understand this code , because i m still not clear

@ZaydH
Copy link

ZaydH commented Jan 17, 2019

Skype will be difficult. If you have questions, I recommend opening issues (one for each question). I can try answering them if I think the question is within my wheelhouse. In the end, the extent that I know about the code in this repository is very limited. I have just tried running it and looking at the debugger. @yanminglai is the expert here -- not me.

@rnehra01
Copy link

Can you tell the name of zip files you downloaded for the dataset? I'm trying to make adversial malware test it on commercially used software but getting the features from cuckoo takes time so I was hoping If you can provide the files and then I will use the features extracted by you.

@ZaydH
Copy link

ZaydH commented Apr 10, 2019

@rnehra01 -- I am not sure if you are asking me or asking @yanminglai . If you are asking me, I implemented my own version of this network using PyTorch. Details of the dataset I used are described in my project's GitHub repository.

@rnehra01
Copy link

Actually, I'm asking about original malware files from which the API calls have been extracted. I check your repo but it has the same type of data as available here. BTW @ZaydH do you happen to know about a dataset where I can find more features (other than just API calls) available publicly so that I don't have to use Cuckoo to extract them.

@ZaydH
Copy link

ZaydH commented Apr 11, 2019

@rnehra01 I am unsure what you mean here. I only uploaded @yanminglai 's NumPy arrays to my repo.

However, as I describe in the README.md, I did not use those files for my experiments. I used the SLEIPNIR dataset. The creators of that dataset requested it not be publicly posted, which I respected. However, you can request access through this online form. Have you checked this and it did not work for you? The SLEIPNIR dataset has about 22,000 features.

@rnehra01
Copy link

Oh.. my bad. I only looked into the data folder and didn't read carefully. I have filled the Google form. Thanks for pointing that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants