Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/merge with latest #16

Merged
merged 177 commits into from
Jun 19, 2024
Merged

Feature/merge with latest #16

merged 177 commits into from
Jun 19, 2024

Conversation

sbneto
Copy link
Collaborator

@sbneto sbneto commented Jun 19, 2024

No description provided.

Tommy Beadle and others added 30 commits April 1, 2024 16:38
…odule import.

Calling Database() just returns a proxy to a real database object that
may or may not exist when the Database object is created. That's fine as
long as no attributes/methods are accessed until the init_database
function has been called, which must be called only once during the
program's startup. For testing purposes, the reset_database function can
be called before calling init_database again. This is useful to create a
fresh in-memory database for the test. Note that the Database objects
that were created in other modules at module-level scope will still be
valid and calling methods on them will be proxied to the new database
instance.

load_vms_tags() and load_vms_exists() have been modified so that they
are not called during import of web_utils but are called where needed
and the database is only contacted the first time--the resulting value
is cached. No database queries should be made simply as a result of
importing modules!
Use a scoped_session to make it easy to always get the current database
session with one per thread. In the scheduler and web requests, wrap the
unit of work (a main loop iteration in the scheduler and a
request/response in the Django app) in a transaction so that the whole
thing or none of it will be committed. This also removes the need for
the classlock decorator around all of the Database methods. It also
removes the need for handling commit/rollback behavior in each of those
methods.

This does make us be more aware of when the database is getting
used, by explicitly starting a transaction. In sqlalchemy 2.0+, we can
add `autobegin=False` to the sessionmaker arguments, which will make it
easier to see when database access is being requested (like by using an
attribute of an object that is "expired", i.e. outside of the
transaction it was retrieved in). Without this knowledge, we could be
making unnecessary calls to the database and not know it.

This type of transaction handling also makes it easier to do things like
find and reserve a machine. Improvements to this will be coming in a PR
soon, but for now, we can at least use `with_for_update()` to lock the
row of the machine that is found and, since that single request is not
its own transaction, it can be locked later in the same transaction and
we are guaranteed that no other thread will grab it.
- Use fixtures appropriately (i.e. be more pytest-y)
- Honor the updated transaction handling. This generally consists of
  wrapping the API call in a transaction as it would be during a web
  request or an iteration of the scheduler. Allow that transaction to be
  committed to make sure rows are written to the database correctly. Use
  another transaction to do validation that the database was updated as
  expected. We need to wrap those in a transaction so that it doesn't
  create an implicit one upon the first database command sent to it.
- Enable engine.echo so that, upon test failure or if `-s` is supplied
  to `pytest`, the SQL statements issued to the database are output.
  This enables for manual verification that it's issuing the commands we
  expect it to.
It is, so far, unchanged except for things required to make it work from
a file separate from scheduler.py.
- Improve the handling of tasks, machines, and how they are updated in
  the database. Use transactions and locks appropriately so that changes
  are more atomic.
- Remove the need for holding the machine_lock for long periods of time
  in the main loop and remove the need for batch scheduling tasks.
- Make the code a lot cleaner and readable, including separation of concerns
  among various classes. Introduce a MachineryManager class that does
  what the name suggests. In the future, this could have an API added
  that could provide us a way to dynamically update machines in the
  database without having to update a conf file and restart cuckoo.py.
Also rework how Config caching is invalidated to make it work better for
tests.
More tests still need to be added for the Scheduler and AnalysisManager,
but this at least gets us a little more testing than what was being done
before.
…sk if the only machines with that tag are "reserved."
…arent process after forking.

Use `engine.dispose` as described in
https://docs.sqlalchemy.org/en/14/core/pooling.html#using-connection-pools-with-multiprocessing-or-os-fork.
The benefit of this approach is that connection pools can still be used
in the child processes.
…r-overhaul

Database and scheduler overhaul
- Use the correct logging level.
- Update the aux modules that are available.
- Use a data file that actually exists.
…r-overhaul

Updates to scheduler overhaul
For scaling machinery, we can't set up the scaling semaphore until after
the machinery has been initialized, since that's when the database is
synchronized with the state of the cloud infrastructure and we need to
use that to determine the limit of the semaphore.
fix issue with creation of the machinery lock.
START the thr_maintain_scaling_bounded_semaphore thread.
karlhiramoto and others added 22 commits June 11, 2024 15:53
* option to stream output

* cleanup

* terminate so test returns

* use ThreadingTCPServer

* catch ConnectionResetError

* on linux  join() must be called to release socket

* win32: call terminate
If max_machines_reached is True, then
find_pending_task_not_requiring_machinery can be called (if allow_static
is True). Don't assert that there's no machinery_manager.

Co-authored-by: Tommy Beadle <[email protected]>
added libyara-dev no requirements to fix (magic.h: no such file or directory)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.0.7 to 2.2.2.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](urllib3/urllib3@2.0.7...2.2.2)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Copy link

@alanjds alanjds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Lets hope it to work.

@sbneto sbneto merged commit 35b165a into develop Jun 19, 2024
@sbneto sbneto deleted the feature/merge-with-latest branch June 19, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.