GSoC 2019 Project Idea : Windows support for CVE Binary Tool #53

terriko · 2019-02-08T00:47:12Z

The CVE Binary tool team is hoping to participate in Google Summer of Code (GSoC) under the Python Software Foundation umbrella. You can read all about what this means at http://python-gsoc.org/. This issue, and any others tagged 'gsoc' are not generally available bugs, but related to project ideas for GSoC.

Project Idea : Windows support for CVE Binary Tool

Project description: The CVE Binary Tool was designed for use on Linux, and thus makes assumptions about the availability of command line utilities, but it doesn't have to be that way. The two utilities it uses for parsing files are file (gives you file type information) and strings (gives you a list of strings found in a given binary). These can be written in pure python, allowing the CVE Binary Tool to be architecture independent.

(Note that it is actually quite possible to run the CVE Binary Tool on Windows right now, if you have those utilities installed through something like cygwin or windows subsystem for linux, but we're hoping for this task that you could pretty much run it on a fresh windows install, and that we'd have the tests to prove it.)

The CVE Binary Tool also uses a number of system utilties for extracting files from various archive formats (from apk to zip files!). These utilities may also have different names on different platforms. Investigate how to deal with those more smoothly. It's possible this could also be done in pure python, we could use utilities that are platform specific and do appropriate checks to make sure they're installed (or suggest them to the user).

Skills: Python, git, multi-platform development

Difficulty level: Intermediate

Related Readings/Links: None at this time.

Potential mentors: @terriko @pdxjohnny @WhataTiberius

Getting Started: There's no "easy" issue that makes a good first commit here, so see the "Getting started" instructions in #24 for setting up your first test.

Another possible good first test is a "real file" test of the checkers. Details on how to add one are are available in #107. Short version, your test will look like this:

    @unittest.skipUnless(os.getenv('LONG_TESTS') == '1', 'Skipping long tests')
    def test_rpm_curl_7_32_0(self):
        """
        test to see if we detect a real copy of curl 7.32.0
        """
        self._file_test(
            'https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/20/Everything/x86_64/os/Packages/c/',
            'curl-7.32.0-3.fc20.x86_64.rpm',
            'curl',
            '7.32.0')

And when you test it locally, you'll need to make sure you have LONG_TESTS enabled, so this one would have to be run as follows:

LONG_TESTS=1 python -m unittest test.test_scanner.TestScanner.test_rpm_curl_7_32_0

What we want to see in your application: Describe how you intend to solve the problem, and give us some "stretch goals" of other platform work you could do once windows is working. Don't forget to include some time for building appropriate tests. (e.g. would you tackle Mac OS next? Improve test coverage? Do you have a feature you'd want to add once this is done?) We think that in an ideal situation, Windows support won't take the full summer, so there's a good chance you'd get to work on the "stretch goal" once the main project is complete.

The text was updated successfully, but these errors were encountered:

giridharprasath · 2019-02-10T12:23:24Z

hi @terriko, I'm interested to work on this, Correct me if I'm wrong. To my understanding, in the windows platform, we have to extract strings(versions) in dll and like we did it in .so files.

Any further readings or hints would be helpful

terriko · 2019-02-11T19:20:47Z

I'm not a windows person, so I don't know, but what do you get when you run strings on a dll? What do you get if you try to run the tool against a dll version of something it already detects? Does it work at all?

giridharprasath · 2019-02-12T08:24:37Z

we are having a strings impl for windows.

curl version:
curl -V

identify strings associated with curl.exe:
strings curl.exe|findstr $version

terriko · 2019-02-13T00:41:43Z

So, I sort of envision this as happening in phases:

Phase 1: run it on a bunch of different of versions of windows and see what fails. Then make sure there are appropriate error messages for all the failures. e.g. if it fails because it can't find "strings" tell them how to install the windows equivalent.

Phase 2: write code so it works without failures -- either by doing windows checks and using the appropriate windows built-in tools, or by using python to do the equivalent (or possibly a combo of both). At this point, you should be able to scan stuff like rpms if you download them to a windows system and have it work correctly.

Phase 3: Look more deeply into what windows support would mean -- adding dll scans, handling .exes that aren't self-executing zipfiles, etc. and then figuring out how to improve those too.

But as I said, I'm really not a windows person (I seriously haven't done windows-based development in over a decade) so if that's a nonsensical plan, it's totally reasonable to come up with a better proposal and explain why it's the more logical way to go!

aydwi · 2019-02-17T10:35:44Z

Hi @terriko and @pdxjohnny. I'm interested in working on this during the summer. Should this be considered as a complete Windows port for CVE Binary Tool? I've only started to look through the codebase, but I'd prefer writing core functionalities in pure Python, which can make porting to newer platforms easier. Further, I like the idea of extending this to OS X as well.

Again, I'm just getting started with this project, I'll read up on the documentation and use cases, explore the tool, and try to come up with a structured approach on how to go about implementing this. Perhaps I can come up with an idea or two of mine. I'll follow up soon.

terriko · 2019-02-28T22:02:32Z

@aydwi It depends on what you mean by "complete windows port" -- To me "complete windows port" sounds like "reimplement everything" and that's not what this will be.

We're talking about a number of if(windows): statements to deal with the dependencies that aren't installed as standard on windows, appropriate error messages for windows users, and then work to deal with standard windows formats (but not doing this only on windows -- we want Linux systems to be able to scan windows formats and vice versa), and if possible setting up CI for windows testing.

Incidentally, for os x support we'll want to test with the os x specific versions of python, so that one would also be nice to have CI for if we can.

pdxjohnny · 2019-03-05T20:48:04Z

Since strings and file have been eliminated. All that needs to happen now (I think) is replace the calls to linux utils in extractor with Python code that does the same thing.

cve-bin-tool/cve_bin_tool/extractor.py

Lines 60 to 111 in ba44e8e

    
           @classmethod 
        
           def extract_file_tar(cls, filename, extraction_path): 
        
               """ Extract tar files """ 
        
               if not inpath('tar'): 
        
                   print("Error: need 'tar' binary in path")  
        
               else: 
        
                   return subprocess.call( 
        
                   ["tar", "-C", extraction_path, "-axf", filename]) 
        
           @classmethod 
        
           def extract_file_rpm(cls, filename, extraction_path): 
        
               """ Extract rpm packages """ 
        
               if not inpath('rpm2cpio') or not inpath('cpio'): 
        
                   print("Error: need 'rpm2cpio' or 'cpio' binary in path")  
        
               else: 
        
                   with popen_ctx(["rpm2cpio", filename], stdout=subprocess.PIPE) as proc: 
        
                       return subprocess.call(["cpio", "-idmv"], stdin=proc.stdout, 
        
                                          cwd=extraction_path) 
        
           @classmethod 
        
           def extract_file_deb(cls, filename, extraction_path): 
        
               """ Extract debian packages """ 
        
               if not inpath('ar'): 
        
                   print("Error: need 'ar' binary in path")  
        
               else: 
        
                   result = subprocess.call(["ar", "x", filename], cwd=extraction_path) 
        
                   if result != 0: 
        
                       return result 
        
                   if not inpath('tar'): 
        
                       print("Error: need 'tar' binary not in path")  
        
                   else: 
        
                       datafile = glob.glob(os.path.join(extraction_path, "data.tar.*"))[0] 
        
                       result = subprocess.call(["tar", "-C", extraction_path, "-axf", datafile]) 
        
                       return result 
        
           @classmethod 
        
           def extract_file_cab(cls, filename, extraction_path): 
        
               """ Extract cab files """ 
        
               if not inpath('cabextract'): 
        
                   print("Error: need 'cabextract' binary in path")  
        
               else: 
        
                   return subprocess.call( 
        
                   ["cabextract", "-d", extraction_path, filename]) 
        
           @classmethod 
        
           def extract_file_zip(cls, filename, extraction_path): 
        
               """ Extract zip files """ 
        
               if not inpath('unzip'): 
        
                   print("Error: need 'unzip' binary in path")  
        
               else: 
        
                   return subprocess.call( 
        
                   ["unzip", "-qq", "-n", "-d", extraction_path, filename])

https://docs.python.org/3/library/shutil.html#archiving-operations may be useful here

terriko · 2019-03-08T17:56:15Z

As per #97, it sounds like there might be some room for improvement on the performance of strings.

terriko · 2019-03-21T23:53:43Z

Added instructions above on how to add a different type of test, for folk still looking for a first (or 15th) commit.

terriko · 2019-08-29T00:14:33Z

Completed and released in 0.3.0. Thanks @wzao1515 for all your great research and work to make this happen!

terriko added the gsoc Tasks related to our participation in Google Summer of Code label Feb 8, 2019

terriko closed this as completed Aug 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GSoC 2019 Project Idea : Windows support for CVE Binary Tool #53

GSoC 2019 Project Idea : Windows support for CVE Binary Tool #53

terriko commented Feb 8, 2019 •

edited

Loading

giridharprasath commented Feb 10, 2019

terriko commented Feb 11, 2019

giridharprasath commented Feb 12, 2019

terriko commented Feb 13, 2019 •

edited

Loading

aydwi commented Feb 17, 2019

terriko commented Feb 28, 2019

pdxjohnny commented Mar 5, 2019

terriko commented Mar 8, 2019

terriko commented Mar 21, 2019

terriko commented Aug 29, 2019

GSoC 2019 Project Idea : Windows support for CVE Binary Tool #53

GSoC 2019 Project Idea : Windows support for CVE Binary Tool #53

Comments

terriko commented Feb 8, 2019 • edited Loading

Project Idea : Windows support for CVE Binary Tool

giridharprasath commented Feb 10, 2019

terriko commented Feb 11, 2019

giridharprasath commented Feb 12, 2019

terriko commented Feb 13, 2019 • edited Loading

aydwi commented Feb 17, 2019

terriko commented Feb 28, 2019

pdxjohnny commented Mar 5, 2019

terriko commented Mar 8, 2019

terriko commented Mar 21, 2019

terriko commented Aug 29, 2019

terriko commented Feb 8, 2019 •

edited

Loading

terriko commented Feb 13, 2019 •

edited

Loading