Feature/merge with latest #16

sbneto · 2024-06-19T17:31:53Z

No description provided.

…odule import. Calling Database() just returns a proxy to a real database object that may or may not exist when the Database object is created. That's fine as long as no attributes/methods are accessed until the init_database function has been called, which must be called only once during the program's startup. For testing purposes, the reset_database function can be called before calling init_database again. This is useful to create a fresh in-memory database for the test. Note that the Database objects that were created in other modules at module-level scope will still be valid and calling methods on them will be proxied to the new database instance. load_vms_tags() and load_vms_exists() have been modified so that they are not called during import of web_utils but are called where needed and the database is only contacted the first time--the resulting value is cached. No database queries should be made simply as a result of importing modules!

Use a scoped_session to make it easy to always get the current database session with one per thread. In the scheduler and web requests, wrap the unit of work (a main loop iteration in the scheduler and a request/response in the Django app) in a transaction so that the whole thing or none of it will be committed. This also removes the need for the classlock decorator around all of the Database methods. It also removes the need for handling commit/rollback behavior in each of those methods. This does make us be more aware of when the database is getting used, by explicitly starting a transaction. In sqlalchemy 2.0+, we can add `autobegin=False` to the sessionmaker arguments, which will make it easier to see when database access is being requested (like by using an attribute of an object that is "expired", i.e. outside of the transaction it was retrieved in). Without this knowledge, we could be making unnecessary calls to the database and not know it. This type of transaction handling also makes it easier to do things like find and reserve a machine. Improvements to this will be coming in a PR soon, but for now, we can at least use `with_for_update()` to lock the row of the machine that is found and, since that single request is not its own transaction, it can be locked later in the same transaction and we are guaranteed that no other thread will grab it.

- Use fixtures appropriately (i.e. be more pytest-y) - Honor the updated transaction handling. This generally consists of wrapping the API call in a transaction as it would be during a web request or an iteration of the scheduler. Allow that transaction to be committed to make sure rows are written to the database correctly. Use another transaction to do validation that the database was updated as expected. We need to wrap those in a transaction so that it doesn't create an implicit one upon the first database command sent to it. - Enable engine.echo so that, upon test failure or if `-s` is supplied to `pytest`, the SQL statements issued to the database are output. This enables for manual verification that it's issuing the commands we expect it to.

It is, so far, unchanged except for things required to make it work from a file separate from scheduler.py.

- Improve the handling of tasks, machines, and how they are updated in the database. Use transactions and locks appropriately so that changes are more atomic. - Remove the need for holding the machine_lock for long periods of time in the main loop and remove the need for batch scheduling tasks. - Make the code a lot cleaner and readable, including separation of concerns among various classes. Introduce a MachineryManager class that does what the name suggests. In the future, this could have an API added that could provide us a way to dynamically update machines in the database without having to update a conf file and restart cuckoo.py.

Also rework how Config caching is invalidated to make it work better for tests. More tests still need to be added for the Scheduler and AnalysisManager, but this at least gets us a little more testing than what was being done before.

…sk if the only machines with that tag are "reserved."

…arent process after forking. Use `engine.dispose` as described in https://docs.sqlalchemy.org/en/14/core/pooling.html#using-connection-pools-with-multiprocessing-or-os-fork. The benefit of this approach is that connection pools can still be used in the child processes.

…r-overhaul Database and scheduler overhaul

…manager.py.

- Use the correct logging level. - Update the aux modules that are available. - Use a data file that actually exists.

…r-overhaul Updates to scheduler overhaul

For scaling machinery, we can't set up the scaling semaphore until after the machinery has been initialized, since that's when the database is synchronized with the state of the cloud infrastructure and we need to use that to determine the limit of the semaphore.

fix issue with creation of the machinery lock.

START the thr_maintain_scaling_bounded_semaphore thread.

* option to stream output * cleanup * terminate so test returns * use ThreadingTCPServer * catch ConnectionResetError * on linux join() must be called to release socket * win32: call terminate

If max_machines_reached is True, then find_pending_task_not_requiring_machinery can be called (if allow_static is True). Don't assert that there's no machinery_manager. Co-authored-by: Tommy Beadle <[email protected]>

added libyara-dev no requirements to fix (magic.h: no such file or directory)

Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.0.7 to 2.2.2. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](urllib3/urllib3@2.0.7...2.2.2) --- updated-dependencies: - dependency-name: urllib3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…ture/merge-with-latest

alanjds

Looks good. Lets hope it to work.

Tommy Beadle and others added 30 commits April 1, 2024 16:38

Move the AnalysisManager to its own module.

8392843

It is, so far, unchanged except for things required to make it work from a file separate from scheduler.py.

Update the unit tests for the database.

08c5aab

Update the remaining unit tests.

8af4e18

Also rework how Config caching is invalidated to make it work better for tests. More tests still need to be added for the Scheduler and AnalysisManager, but this at least gets us a little more testing than what was being done before.

Allow a "reserved" machine to be used when a tag is provided for a ta…

a68f2fe

…sk if the only machines with that tag are "reserved."

Merge pull request kevoreilly#2037 from tbeadle/database-and-schedule…

ed5f57e

…r-overhaul Database and scheduler overhaul

Remove test_scheduler.py. It has been moved/updated in test_analysis_…

bfa20bd

…manager.py.

Fix bug in analysis_manager.

5b0b021

Fix analysis_manager tests.

fa52ae0

- Use the correct logging level. - Update the aux modules that are available. - Use a data file that actually exists.

Merge pull request kevoreilly#2040 from tbeadle/database-and-schedule…

9d9ef1f

…r-overhaul Updates to scheduler overhaul

Merge branch 'master' into staging

bd03801

add customizable cape config directory

555de24

Update dist.py

a239be3

Merge branch 'master' into staging

273f080

Update analysis_manager.py

379724d

cleanup

3ae8ce3

Merge branch 'master' into staging

ddccfdc

Update machinery_manager.py

7d9cf94

Merge branch 'master' into staging

5e3be81

Merge pull request kevoreilly#2049 from tbeadle/machinery-lock-fix

b3f19c6

fix issue with creation of the machinery lock.

START the thr_maintain_scaling_bounded_semaphore thread.

99a36f7

Update poetry.lock.

559a21d

Merge pull request kevoreilly#2055 from tbeadle/start-scaling-thread

5f5b70e

START the thr_maintain_scaling_bounded_semaphore thread.

ci: Update requirements.txt

c1faff3

Merge branch 'master' into staging

f370903

karlhiramoto and others added 22 commits June 11, 2024 15:53

Agent: support streaming a file off the guest. (kevoreilly#2161)

99dd2cd

* option to stream output * cleanup * terminate so test returns * use ThreadingTCPServer * catch ConnectionResetError * on linux join() must be called to release socket * win32: call terminate

style: Automatic code formatting

e060749

Update quarantine.py

6729b0b

style: Automatic code formatting

3935afa

Monitor update: Unpacker enhancement: capture modified mapped images

413a493

Update quarantine.py

ec37a5d

Update quarantine.py

26e5e61

Update quarantine.py

95c6f21

Update quarantine.py

6b4d1ff

Update kvm-qemu.sh

8135fe2

capa 7.1.0

3c495d4

ci: Update requirements.txt

2dc76f9

Fix correct storage path for 7zip (kevoreilly#2170)

dab1d69

fix dnfile 0.15 parsers (kevoreilly#2171)

eae3c56

Update Njrat.py

296d3d5

Update Quickbind yara (kevoreilly#2173)

317b861

Update cape2.sh (kevoreilly#2174)

ac53512

added libyara-dev no requirements to fix (magic.h: no such file or directory)

ci: Update requirements.txt

d374a72

dont install pyre2 anymore

c0ec56b

Merge branch 'master' of https://github.com/polyswarm/CAPEv2 into fea…

e05d683

…ture/merge-with-latest

sbneto requested review from alanjds, mrsarm, supernothing, mjbradford89 and mrtizmoatwork June 19, 2024 17:33

alanjds approved these changes Jun 19, 2024

View reviewed changes

sbneto merged commit 35b165a into develop Jun 19, 2024

sbneto deleted the feature/merge-with-latest branch June 19, 2024 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/merge with latest #16

Feature/merge with latest #16

sbneto commented Jun 19, 2024

alanjds left a comment

Feature/merge with latest #16

Feature/merge with latest #16

Conversation

sbneto commented Jun 19, 2024

alanjds left a comment

Choose a reason for hiding this comment