Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async API #569

Merged
merged 11 commits into from
Jul 28, 2022
Merged

async API #569

merged 11 commits into from
Jul 28, 2022

Conversation

douglas-raillard-arm
Copy link
Contributor

@douglas-raillard-arm douglas-raillard-arm commented Aug 19, 2021

Add an async API to devlib to take advantage of concurrency more easily, both at a coarse and fine grain.

Note: This currently requires python 3.7

Based on #568

@douglas-raillard-arm
Copy link
Contributor Author

douglas-raillard-arm commented Aug 19, 2021

Preliminary results on:

        with target.cpufreq.use_governor('performance'):
            pass

1.4x faster on a SSH target (juno r0):

  • on this branch: 3.7s (avg over 10 runs)
  • on master: 5.3s (avg over 10 runs)

2.2x faster on a local target:

  • on this branch: ~9s
  • on master: ~20s

@douglas-raillard-arm
Copy link
Contributor Author

I checked all instances of threading.Lock in devlib and it should be safe, as none is in the path used when running a background command or any other coroutine.

If a function taking a threading.Lock was being called by 2 coroutines that end up running concurrently, the 2nd coroutine would wait for the lock, thereby blocking the main thread and deadlocking.

@douglas-raillard-arm
Copy link
Contributor Author

I also tried various combinations of thread pools, including running a blocking execute() in a different thread (so with separate SSH connections) and I did not get any real timing variations.

What I did get is some errors probably because I was maintaining too many connections on the SSH server, so the current approach with one connection and many concurrent SSH channels seems to be the best. I could not saturate the server with open channels (maybe there is no limit). If that was to happen, we could use an asyncio.Semaphore to limit the number of bg commands in flight.

@douglas-raillard-arm
Copy link
Contributor Author

douglas-raillard-arm commented Aug 31, 2021

There are a few things to cleanup (docstring etc) so if you are happy with the API as it stands I will proceed with:

  • adding docstrings
  • renaming asyn.parallel into asyn.concurrently
  • remove the .concurrent attribute, as it is not that useful in practice I think (it allows running the function concurrently, only blocking to get the result when it is awaited. The "normal" couroutine behavior is to not do anything until the return value is awaited). asyn.concurrently is usually better as it makes it obvious what runs concurrently and avoids any kind of "task leak".

EDIT: cleanup done. Beyond possible high level doc and converting more bits, the change is somewhat ready.

@douglas-raillard-arm
Copy link
Contributor Author

While working on a bug fix for another problem, it came to my attention that SSH servers accept a fixed number of "sessions" per connection. This "sessions" seem to map to paramiko's channels. This means that for a given connection, we might be limited to e.g. 10 concurrent channels (default of OpenSSH). This is configurable with MaxSessions on OpenSSH side and results in an exception on paramiko side: Could not open an SSH channel: ChannelException(2, 'Connect failed')

I'll probably need to handle that in that PR to limit the number of opened channels. There does not seem to be a standard way of getting the max number of channels, but we can probably handle the exception as a hint, or attempt to read OpenSSH config and parse it to find the MaxSessions value (a bit hacky though).

@douglas-raillard-arm
Copy link
Contributor Author

@setrofim @marcbonnici I've updated the PR with an automatic detection of number of allowed background commands, along with some automatic limiting in the async API to use at most half of the available channels.

Next step:

  • Decide what version of Python we want to support
  • "instrument" read_value() and write_value() such that we can detect when we are trying to read/write to the same files concurrently. This should catch most issues where we are trying to do some setup concurrently, which is where the bulk of the speed up will come from.

@douglas-raillard-arm
Copy link
Contributor Author

douglas-raillard-arm commented Nov 11, 2021

Updated the PR with checks to make sure that no file access conflicts when executing concurrent asyncio tasks. An artificial example:

from devlib.utils.asyn import *

m = ConcurrentResourceManager()

r1 = FileResource('host', 'file1', 'w')
r2 = FileResource('host', 'file1', 'r')

async def f1():
    # m.track_resource(r1)
    m.track_resource(r2)
    print('from f1')


async def f2():
    m.track_resource(r1)
    m.track_resource(r2)
    print('from f2')

async def f3():
    # m.track_resource(r1)
    print('from f3')

run(
    m.concurrently(
        (
            f1(),
            m.concurrently(
                (
                    f2(),
                    f3(),
                )
            )
        )
    )
)

In the PR, ConcurrentResourceManager is target.resource_manager, which is thread-local

Note: this depends on Python >= 3.7 for asyncio.current_task()

@douglas-raillard-arm
Copy link
Contributor Author

FWIW we have been using this PR in Lisa for a month now (Lisa repo vendorizes devlib as a git subtree) and so far I haven't observed or heard of any issue related to that

devlib/target.py Outdated Show resolved Hide resolved
devlib/target.py Show resolved Hide resolved
Implement __aenter__ and __aexit__ on nullcontext so it can be used as
an asynchronous context manager.
The conncetion returned by Target.get_connection() does not have its
.busybox attribute initialized. This is expected for the first
connection, but connections created for new threads should have busybox
set.
Remove all the tls_property from the state, as they will be recreated
automatically.
Remove dependencies that are ruled out due to the current Python minimal
version requirement.
Require Python >= 3.7 in order to have access to a fully fledged asyncio
module.
@douglas-raillard-arm
Copy link
Contributor Author

douglas-raillard-arm commented Jul 26, 2022

@marcbonnici PR updated with:

  • Fix to connect(max_async)
  • Fix to cpufreq.use_governor() racy file write
  • Fix to async.asyncf() decorator for async generators: async generators are now consumed completely when crossing a blocking boundary, rather than failing to await on them. We could maybe do something a bit smarter to preserve lazyness provided that we can asyncio.run() anywhere, so I'll have a look later.
  • Added asyn.memoized_method() decorator that works for both async and non-async code. This does not memoize non-hashable data so it will avoid issues described here: Surprising memoized behavior for mutable containers #341
  • Converted the final bits of the cpufreq module to async

EDIT: Forgot to add the last bullet entry

Home for async-related utilities.
Add async variants of Target methods.
Allow the user to set a maximum number of conrruent connections used to
dispatch non-blocking commands when using the async API.
Make use of the new async API to speedup other parts of devlib.
@douglas-raillard-arm
Copy link
Contributor Author

PR re-updated with a lazy async generator blocking shim. This preserves the laziness of the async gen, made possible by nest_asyncio package.

use_governor() was trying to set concurrently both per-cpu and global tunables for
each governor, which lead to a write conflict.

Split the work into the per-governor global tunables and the per-cpu
tunables, and do all that in concurrently. Each task is therefore
responsible of a distinct set of files and all is well.

Also remove @memoized on async functions. It will be reintroduced in a
later commit when there is a safe alternative for async functions.
Add a memoize_method decorator that works for async methods. It will not
leak memory since the memoization cache is held in the instance
__dict__, and it does not rely on hacks to hash unhashable data.
@marcbonnici marcbonnici merged commit fefdf29 into ARM-software:master Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants