Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage appears to have significant increased with latest version #665

Closed
nycnewman opened this issue Mar 7, 2020 · 14 comments
Closed
Labels
bug Something isn't working component-provider-gcp Affects GCP provider
Milestone

Comments

@nycnewman
Copy link

nycnewman commented Mar 7, 2020

Describe the bug

When running ScoutSuite against GCP organization, we see significant memory usage in program (seen up to 32Gb) in recent releases. Don't remember seeing this with earlier version of the program.

To Reproduce

$ python3 scout.py gcp --user-account --organization-id <org-id>--all-projects

Relatively small GCP setup. ~140 projects with majority not having any GCP resources (only app-scripts)

@nycnewman nycnewman added bug Something isn't working potential Unconfirmed issue labels Mar 7, 2020
@nycnewman
Copy link
Author

Saw it get to 44Gb of memory use and eventually had a "Bad file descriptor" error (due to time to execute?)

@x4v13r64
Copy link
Collaborator

@nycnewman thank's for opening the issue, a few things:

  • Very little has changed GCP-wise, so I'm not sure why the memory usage would have increased.
    • Could you test with a previous version and confirm this is the case?
    • We don't have access to such an environment so I doubt we can test this on our end.

@kareem-DA
Copy link

Hello j4v. I work with nycnewman and I would be happy to work with you on this. Over the last several days, I have been slowly rolling back the version to see when this happened. My last run was on 5.2.0. I got the same result. I am running on a machine with 15G of ram, but I am consistently maxing it out.

@x4v13r64
Copy link
Collaborator

Cheers @kareem-DA, if you can identify what version causes a high memory usage (if any) that would be a good first step. Another option would be to filter per service (--services) or per account. Maybe you have one account with a gazillion resources?

Also, feel free to contact us through [email protected] if you can share information you don't want on GitHub.

@kareem-DA
Copy link

kareem-DA commented Mar 12, 2020

Ok. I did some more digging to narrow this down. I ran pretty much all of the versions at this point. 5.6, 5.5, 5.4, 5.3, 5.2, and 5.1. Anything earlier then that and I was ran into other issues. On all versions I was seeing the same thing. I re-ran with the -services flag, when I removed the computeengine.

This worked without any problems.

docker run --rm -t -v /opt/scoutcreds/:/root/creds:ro -v /opt/scout/gcp/$(date +%Y%m%d):/opt/scoutsuite-report <removed>/scoutsuite:latest gcp --service-account /root/creds/gcp.json  --organization-id  <removed>  --services credentials cloudresourcemanager cloudsql cloudstorage iam kms 

The version of scoutsuite that this docker image is built on it 5.6.

@x4v13r64
Copy link
Collaborator

So... where's the issue? From what you're telling me this is a non-issue, i.e. there isn't a significant memory usage increase in one of the versions?

@kareem-DA
Copy link

No, There is an issue. I can't pin it down to a version. I don't know what happened, but there is something somewhere in the compute engine. Everything else runs fine. I started a run with just the compute engine last night. It ran for 18 hours till and used up all 16G of ram on the machine. I am not sure the best route to debug this from here. On my run last night, I all include the --debug option. I don't see anything that would indicate where it went wrong. I see a number of errors where the code tried to connect to an API that isn't enabled and then it finally runs out of resources and can't make connections. Unfortunately, thats all I have.

@nycnewman
Copy link
Author

There is a memory issue that causes scout to use 45Gb on my machine before dying. This may be an Google API change rather than your code but something has occurred in last two months that causes this. As my colleague says it appears to be related to the checks you are doing against Compute services

@x4v13r64
Copy link
Collaborator

Following up through email.

@rtomlinson-latacora
Copy link
Contributor

rtomlinson-latacora commented Mar 16, 2020

I think this might be somewhat related to #443.
We will sometimes run out of memory when running against large organizations and I think it has to do with too many clients being created.
I have a solution that is more PoC at this point but seems to work.
https://github.com/latacora/ScoutSuite/blob/sema-instances/ScoutSuite/providers/gcp/facade/gce.py

In this example, the idea is to use a semaphore to limit the number of clients being created for fetching instance data. Without it, ScoutSuite will just create some huge number of gcp clients before getting to the blocking awaits.

A more complete solution would probably be introduce the semaphore at a higher level and rewrite the Facade modules to acquire the semaphore lock before making requests using a client. Presumably you'd be able to set the semaphore value on the CLI as well.

@rtomlinson-latacora
Copy link
Contributor

I don't have exact numbers but without this fix, ScoutSuite will easily kill a docker container with 30+GB available to it. With it, I'm not even sure, but maybe 2-4GBs? The run does take a significant amount of time though.

@x4v13r64
Copy link
Collaborator

x4v13r64 commented Mar 17, 2020

@rtomlinson-latacora thanks for the info, I do think it's related to #443!

My initial comment to @kareem-DA:
The underlying issue is that the GCP client isn’t thread safe since it relies on http2 (googleapis/google-cloud-python#3501), which forces us to create a new client for every thread/request. This is very costly both in time and memory.
Looks like https://github.com/GoogleCloudPlatform/httplib2shim could be leveraged, will have to test.

Looking at the above issue & googleapis/google-cloud-python#3674, it looks like http2 is no longer in use. Therefore I'd think we can start reusing clients as we do for the other providers? Will have to test it out.

There are 2 libraries in use (https://cloud.google.com/apis/docs/client-libraries-explained), https://github.com/googleapis/google-api-python-client (uses httplib2, not thread safe) and https://github.com/googleapis/google-cloud-python (uses requests, thread safe).

The issue is that for a number of services (e.g. Compute Engine), there is no official support for the thread safe version.

@x4v13r64 x4v13r64 added this to the 5.9.0 milestone Mar 17, 2020
@x4v13r64 x4v13r64 added component-provider-gcp Affects GCP provider and removed potential Unconfirmed issue labels Mar 17, 2020
@x4v13r64
Copy link
Collaborator

@rtomlinson-latacora, @kareem-DA & @nycnewman I've opened the #676 PR, which uses https://github.com/GoogleCloudPlatform/httplib2shim to monkey patch the GCP library that's not thread safe. From my tests it runs fine and considerably diminishes memory consumption. Could you test on your end?

@x4v13r64 x4v13r64 modified the milestones: 5.9.0, 5.8.0 Mar 20, 2020
@x4v13r64
Copy link
Collaborator

Closing, this should now be fixed in develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component-provider-gcp Affects GCP provider
Projects
None yet
Development

No branches or pull requests

4 participants