Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC 2019 Project Idea : Windows support for CVE Binary Tool #53

Closed
terriko opened this issue Feb 8, 2019 · 10 comments
Closed

GSoC 2019 Project Idea : Windows support for CVE Binary Tool #53

terriko opened this issue Feb 8, 2019 · 10 comments
Labels
gsoc Tasks related to our participation in Google Summer of Code

Comments

@terriko
Copy link
Contributor

terriko commented Feb 8, 2019

The CVE Binary tool team is hoping to participate in Google Summer of Code (GSoC) under the Python Software Foundation umbrella. You can read all about what this means at http://python-gsoc.org/. This issue, and any others tagged 'gsoc' are not generally available bugs, but related to project ideas for GSoC.

Project Idea : Windows support for CVE Binary Tool

Project description: The CVE Binary Tool was designed for use on Linux, and thus makes assumptions about the availability of command line utilities, but it doesn't have to be that way. The two utilities it uses for parsing files are file (gives you file type information) and strings (gives you a list of strings found in a given binary). These can be written in pure python, allowing the CVE Binary Tool to be architecture independent.

(Note that it is actually quite possible to run the CVE Binary Tool on Windows right now, if you have those utilities installed through something like cygwin or windows subsystem for linux, but we're hoping for this task that you could pretty much run it on a fresh windows install, and that we'd have the tests to prove it.)

The CVE Binary Tool also uses a number of system utilties for extracting files from various archive formats (from apk to zip files!). These utilities may also have different names on different platforms. Investigate how to deal with those more smoothly. It's possible this could also be done in pure python, we could use utilities that are platform specific and do appropriate checks to make sure they're installed (or suggest them to the user).

Skills: Python, git, multi-platform development

Difficulty level: Intermediate

Related Readings/Links: None at this time.

Potential mentors: @terriko @pdxjohnny @WhataTiberius

Getting Started: There's no "easy" issue that makes a good first commit here, so see the "Getting started" instructions in #24 for setting up your first test.

Another possible good first test is a "real file" test of the checkers. Details on how to add one are are available in #107. Short version, your test will look like this:

    @unittest.skipUnless(os.getenv('LONG_TESTS') == '1', 'Skipping long tests')
    def test_rpm_curl_7_32_0(self):
        """
        test to see if we detect a real copy of curl 7.32.0
        """
        self._file_test(
            'https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/20/Everything/x86_64/os/Packages/c/',
            'curl-7.32.0-3.fc20.x86_64.rpm',
            'curl',
            '7.32.0')

And when you test it locally, you'll need to make sure you have LONG_TESTS enabled, so this one would have to be run as follows:

LONG_TESTS=1 python -m unittest test.test_scanner.TestScanner.test_rpm_curl_7_32_0

What we want to see in your application: Describe how you intend to solve the problem, and give us some "stretch goals" of other platform work you could do once windows is working. Don't forget to include some time for building appropriate tests. (e.g. would you tackle Mac OS next? Improve test coverage? Do you have a feature you'd want to add once this is done?) We think that in an ideal situation, Windows support won't take the full summer, so there's a good chance you'd get to work on the "stretch goal" once the main project is complete.

@terriko terriko added the gsoc Tasks related to our participation in Google Summer of Code label Feb 8, 2019
@giridharprasath
Copy link
Contributor

hi @terriko, I'm interested to work on this, Correct me if I'm wrong. To my understanding, in the windows platform, we have to extract strings(versions) in dll and like we did it in .so files.

Any further readings or hints would be helpful

@terriko
Copy link
Contributor Author

terriko commented Feb 11, 2019

I'm not a windows person, so I don't know, but what do you get when you run strings on a dll? What do you get if you try to run the tool against a dll version of something it already detects? Does it work at all?

@giridharprasath
Copy link
Contributor

we are having a strings impl for windows.

curl version:
curl -V

identify strings associated with curl.exe:
strings curl.exe|findstr $version

@terriko
Copy link
Contributor Author

terriko commented Feb 13, 2019

So, I sort of envision this as happening in phases:

Phase 1: run it on a bunch of different of versions of windows and see what fails. Then make sure there are appropriate error messages for all the failures. e.g. if it fails because it can't find "strings" tell them how to install the windows equivalent.

Phase 2: write code so it works without failures -- either by doing windows checks and using the appropriate windows built-in tools, or by using python to do the equivalent (or possibly a combo of both). At this point, you should be able to scan stuff like rpms if you download them to a windows system and have it work correctly.

Phase 3: Look more deeply into what windows support would mean -- adding dll scans, handling .exes that aren't self-executing zipfiles, etc. and then figuring out how to improve those too.

But as I said, I'm really not a windows person (I seriously haven't done windows-based development in over a decade) so if that's a nonsensical plan, it's totally reasonable to come up with a better proposal and explain why it's the more logical way to go!

@aydwi
Copy link

aydwi commented Feb 17, 2019

Hi @terriko and @pdxjohnny. I'm interested in working on this during the summer. Should this be considered as a complete Windows port for CVE Binary Tool? I've only started to look through the codebase, but I'd prefer writing core functionalities in pure Python, which can make porting to newer platforms easier. Further, I like the idea of extending this to OS X as well.

Again, I'm just getting started with this project, I'll read up on the documentation and use cases, explore the tool, and try to come up with a structured approach on how to go about implementing this. Perhaps I can come up with an idea or two of mine. I'll follow up soon.

@terriko
Copy link
Contributor Author

terriko commented Feb 28, 2019

@aydwi It depends on what you mean by "complete windows port" -- To me "complete windows port" sounds like "reimplement everything" and that's not what this will be.

We're talking about a number of if(windows): statements to deal with the dependencies that aren't installed as standard on windows, appropriate error messages for windows users, and then work to deal with standard windows formats (but not doing this only on windows -- we want Linux systems to be able to scan windows formats and vice versa), and if possible setting up CI for windows testing.

Incidentally, for os x support we'll want to test with the os x specific versions of python, so that one would also be nice to have CI for if we can.

@pdxjohnny
Copy link
Member

Since strings and file have been eliminated. All that needs to happen now (I think) is replace the calls to linux utils in extractor with Python code that does the same thing.

@classmethod
def extract_file_tar(cls, filename, extraction_path):
""" Extract tar files """
if not inpath('tar'):
print("Error: need 'tar' binary in path")
else:
return subprocess.call(
["tar", "-C", extraction_path, "-axf", filename])
@classmethod
def extract_file_rpm(cls, filename, extraction_path):
""" Extract rpm packages """
if not inpath('rpm2cpio') or not inpath('cpio'):
print("Error: need 'rpm2cpio' or 'cpio' binary in path")
else:
with popen_ctx(["rpm2cpio", filename], stdout=subprocess.PIPE) as proc:
return subprocess.call(["cpio", "-idmv"], stdin=proc.stdout,
cwd=extraction_path)
@classmethod
def extract_file_deb(cls, filename, extraction_path):
""" Extract debian packages """
if not inpath('ar'):
print("Error: need 'ar' binary in path")
else:
result = subprocess.call(["ar", "x", filename], cwd=extraction_path)
if result != 0:
return result
if not inpath('tar'):
print("Error: need 'tar' binary not in path")
else:
datafile = glob.glob(os.path.join(extraction_path, "data.tar.*"))[0]
result = subprocess.call(["tar", "-C", extraction_path, "-axf", datafile])
return result
@classmethod
def extract_file_cab(cls, filename, extraction_path):
""" Extract cab files """
if not inpath('cabextract'):
print("Error: need 'cabextract' binary in path")
else:
return subprocess.call(
["cabextract", "-d", extraction_path, filename])
@classmethod
def extract_file_zip(cls, filename, extraction_path):
""" Extract zip files """
if not inpath('unzip'):
print("Error: need 'unzip' binary in path")
else:
return subprocess.call(
["unzip", "-qq", "-n", "-d", extraction_path, filename])

https://docs.python.org/3/library/shutil.html#archiving-operations may be useful here

@terriko
Copy link
Contributor Author

terriko commented Mar 8, 2019

As per #97, it sounds like there might be some room for improvement on the performance of strings.

@terriko
Copy link
Contributor Author

terriko commented Mar 21, 2019

Added instructions above on how to add a different type of test, for folk still looking for a first (or 15th) commit.

@terriko
Copy link
Contributor Author

terriko commented Aug 29, 2019

Completed and released in 0.3.0. Thanks @wzao1515 for all your great research and work to make this happen!

@terriko terriko closed this as completed Aug 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gsoc Tasks related to our participation in Google Summer of Code
Projects
None yet
Development

No branches or pull requests

4 participants