Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tracking signals for getting request/response bodies. #2767

Merged

Conversation

kowalski
Copy link
Contributor

This is PR for issue: #2755

@codecov-io
Copy link

codecov-io commented Feb 26, 2018

Codecov Report

Merging #2767 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2767      +/-   ##
==========================================
+ Coverage   97.98%   97.99%   +<.01%     
==========================================
  Files          39       39              
  Lines        7347     7377      +30     
  Branches     1289     1296       +7     
==========================================
+ Hits         7199     7229      +30     
  Misses         47       47              
  Partials      101      101
Impacted Files Coverage Δ
aiohttp/client_reqrep.py 97.44% <100%> (+0.05%) ⬆️
aiohttp/tracing.py 100% <100%> (ø) ⬆️
aiohttp/http_writer.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5382822...d7c995a. Read the comment docs.

Copy link
Member

@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea but implementation need a polishing.

"""Writes chunk of data to a stream.

write_eof() indicates end of stream.
writer can't be used after write_eof() method being called.
write() return drain future.
"""
self.on_chunk_sent.freeze()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pretty sure there is better place for signal freezing than first write attempt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was quite surprised with the whole concept of freeze(). It's very tricky! Do you know why it's necessary at all ?
The only reason I can think of is that the author wanted to prevent modifications to list of handlers while processing send().

Still, it's possible that we will decide to remove this Signal altogether, but if we don't, what do you think is the correct place to call freeze() ?

  • I can't do this in __init__ because I want to add handlers later
  • I can't delegate this responsibility to the user of the class, because this introduces a braking change where a code which works now like:
    writer = StreamWriter(...)
    await writer.write(...)
    
    will start raising RuntimeError

@@ -56,13 +59,18 @@ def _write(self, chunk):
raise asyncio.CancelledError('Cannot write to closing transport')
self._transport.write(chunk)

def write(self, chunk, *, drain=True, LIMIT=64*1024):
def write(self, chunk, *, drain=True, LIMIT=64 * 1024):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't touch if you don't change the logic.
If you want update code style -- do it in a separate PR.
For this particular line I pretty happy with status quo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's wasn't me, it's my emacs ;) I will revert this change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I fixed setup.cfg so that this line is not altered by anyone else who uses autopep8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool

"""Writes chunk of data to a stream.

write_eof() indicates end of stream.
writer can't be used after write_eof() method being called.
write() return drain future.
"""
self.on_chunk_sent.freeze()
self.loop.create_task(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawning a new task is not what we need.
Technically .write is a coroutine, we need convert it to genuine async function.
Maybe I'll do it very soon in separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we don't want this in another task ? Whatever handler does asynchronously will slow down writing the response.

Also .write isn't actually coroutine, because it's not called like:

                for chunk in self.body:
                    writer.write(chunk)

(without await) The couroutine returned is never triggered.
I guess I would have to track all the places there StreamWriter.write is used and add await in there. That's not undoable, but it's rather big refactor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm sure. Spawning a task without waiting for result has an ugly smell.
More important spawning a task may reorder received signals, e.g. user may get request chunk after request finished.

.write should be async method. I'm working on this refactoring (required for #2698 anyway).
I suggest waiting for my job done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I will wait until the refactoring is merged.

@@ -573,6 +587,9 @@ def __init__(self, method, url, *,
self._auto_decompress = auto_decompress
self._cache = {} # reqired for @reify method decorator

# avoid circular reference so that __del__ works
self.on_chunk_received = Signal(owner=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need a signal here.
It looks like signal concept overusage: you are creating on_chunk_received signal with the only usage by aiohttp internals itself. It is not very effective, signal subscription is literally a waste of time.
I suggest passing traces directly to response object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the approach with passing traces will actually work quite well. I've introduced the signal to make it symetrical to the situation in request. I will update this to pass traces and remove the signal

@@ -31,6 +32,8 @@ def __init__(self, protocol, transport, loop):
self._compress = None
self._drain_waiter = None

self.on_chunk_sent = Signal(self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again pass traces instead of creating a new internal-only signal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, please consider that StreamWriter is used both by the http client and server. If I pass the traces in __init__() I will have to make something like:

def write( ):
    for trace in self.traces:
        await trace.sent_request_chunk_sent(...)
    ...

Now if you consider that code in the context of usage by http server, it clearly stops making sense....
This is why I introduced a signal and names it on_chunk_sent. Without specifying if it's http request it's sending or http response (as a server).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid signals for internal-only usage.
Here we have two options: pass a callback for sending signals or derive a ClientStreamWriter class. Personally I prefer the second approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the Signal in favor of accepting an optional callable to be used as a callback.
I prefer not to create a child class for this - too many layers of inheritance hurts code readability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@asvetlov
Copy link
Member

Documentation should be updated as well

@asvetlov asvetlov mentioned this pull request Feb 27, 2018
@asvetlov
Copy link
Member

@kowalski #2774 is done, write() is async now

@kowalski kowalski force-pushed the feature/add-signals-for-reqres-chunks branch from e093c00 to 24e1db9 Compare February 27, 2018 13:24
@kowalski
Copy link
Contributor Author

@asvetlov all fixes done, including adding documentation.
I'm not squashing commits so that you can see what changes. I can squash changes after review to keep history clean.

Copy link
Member

@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but please fix a couple notes.

@@ -475,7 +477,14 @@ def keep_alive(self):
if self.url.raw_query_string:
path += '?' + self.url.raw_query_string

writer = StreamWriter(conn.protocol, conn.transport, self.loop)
async def on_chunk_sent(chunk):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert the function into ClientRequest's method. No need to create a nested function on every HTTP request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -62,6 +70,9 @@ def _write(self, chunk):
writer can't be used after write_eof() method being called.
write() return drain future.
"""
if self._on_chunk_sent:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do check self._on_chunk_sent is not None. It is more idiomatic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

'TraceDnsResolveHostStartParams', 'TraceDnsResolveHostEndParams',
'TraceDnsCacheHitParams', 'TraceDnsCacheMissParams',
'TraceRequestRedirectParams'
'TraceConfig', 'TraceRequestStartParams',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need to touch all these lines?
Appending two classes to the end of sequence should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, reverted and added new classes at the end

@@ -147,6 +147,20 @@ TraceConfig

``params`` is :class:`aiohttp.TraceRequestStartParams` instance.

.. attribute:: on_request_chunk_sent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also upgrade flow diagrams at the beginning of file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Here I added new signals to description. Please let me know if you meant something more elaborate.
zrzut ekranu z 2018-02-27 15-04-09

@asvetlov
Copy link
Member

Not squashing is fine, I'll do it on merging anyway. Well, github button will do actually.

@kowalski kowalski force-pushed the feature/add-signals-for-reqres-chunks branch from c6a1ebd to 6e3819f Compare February 27, 2018 14:02
Copy link
Member

@kxepal kxepal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more notes.

@@ -168,7 +168,8 @@ def __init__(self, method, url, *,
proxy=None, proxy_auth=None,
timer=None, session=None, auto_decompress=True,
ssl=None,
proxy_headers=None):
proxy_headers=None,
traces=[]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use mutables as defaults. These may cause some nasty bugs we could easily avoid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, changed to None

@@ -572,6 +583,7 @@ def __init__(self, method, url, *,
self._timer = timer if timer is not None else TimerNoop()
self._auto_decompress = auto_decompress
self._cache = {} # reqired for @reify method decorator
self._traces = traces
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why in ClientRequest traces is public attribute while in ClientResponse it's private?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can answer: ClientResponse is a public class visible by user.
The attr should be private.
About request I don't care.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fine for me.

Copy link
Contributor Author

@kowalski kowalski Feb 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I switched now to _traces in both classes

@@ -555,7 +565,8 @@ class ClientResponse(HeadersMixin):

def __init__(self, method, url, *,
writer=None, continue100=None, timer=None,
request_info=None, auto_decompress=True):
request_info=None, auto_decompress=True,
traces=[]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same defaults story.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also changed to None

@@ -34,8 +34,8 @@ Overview
exception[shape=flowchart.terminator, description="on_request_exception"];

acquire_connection[description="Connection acquiring"];
got_response;
send_request;
got_response[description="on_response_chunk_received"];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename got_response and send_request nodes to be close to actual signal names. Maybe that long names could be shortened on diagram -- be creative.

Also I like to see arrows what explicitly points that chunk_sent and chunk_received events can come more than once.

@asvetlov
Copy link
Member

Wait, send_request and got_response in my mind is mostly for headers. We need different nodes for body chunks.

@asvetlov
Copy link
Member

Please add ..versionadded:: 3.1 for new signals and data classes

setup.cfg Outdated
@@ -1,5 +1,6 @@
[pep8]
max-line-length=79
ignore=E225,E226
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in doubt that we should ignore these. Any reasons why to disable them globally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not in place to judge this. I did it to be able to save a file which has 64*1000 literal without spaces around *. Otherwise autopep just fixes it for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That autopep suggestion looks correct. You may also write 64000 instead without any loss in readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with you. @asvetlov specifically asked me to keep it as was.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the strange thing is successful passing flake8 checks without the ignore setting.
I'm totally fine with 64000 or even better 0x10000. Pretty sure it should be 64KiB instead of 64kB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop the change. Feel free to replace the limit with 0x10000 if needed -- I'm +-0 for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@kowalski
Copy link
Contributor Author

@asvetlov
If I split nodes for headers and chunks this is what I get. What do you think ?
zrzut ekranu z 2018-02-27 15-25-33

@asvetlov
Copy link
Member

The diagram looks better but I don't see arrows from sent/recevied nodes to exception.
Maybe resulting picture will be unreadable, in this case I suggest extracting headers/body signals into separate diagram -- like I did for connection establishment and DNS lookup.
I don't know -- please do the best but if you'll fail I'll polish the diagram after merging.

Also the diagram shows that we need on_request_headers_sent and on_response_headers_received signals maybe :) -- but in separate PR obviously.

@@ -147,6 +155,23 @@ TraceConfig

``params`` is :class:`aiohttp.TraceRequestStartParams` instance.

.. attribute:: on_request_chunk_sent

.. versionadded:: 3.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the line after the signal description, below params is :class:aiohttp.TraceRequestChunkSentParams instance. line.

This is our style guide for aiohttp documentation

when a chunk of response body is received.

``params`` is :class:`aiohttp.TraceResponseChunkReceivedParams` instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add versionadded here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@kxepal
Copy link
Member

kxepal commented Feb 28, 2018

@asvetlov
Ok, approving to not be a blocker. It seems you're on pulse of further changes here.

@asvetlov
Copy link
Member

@kxepal thanks

@pfreixes
Copy link
Contributor

Just missing the proper tests in test_http_write.py and test_client_response.py that must check the changes at the function level, for the default path - traces and on_chunk_sent as None - and
the none default path. We alredy have it for test_connection.py right now.

I know that globally new handlers are tested via test_client_session.py, but its more an integration test, local tests allow us to check al branches explicitly.

@asvetlov
Copy link
Member

@pfreixes agree.
@kowalski please add required tests

@kowalski
Copy link
Contributor Author

kowalski commented Mar 1, 2018

all done

@pfreixes
Copy link
Contributor

pfreixes commented Mar 1, 2018

Thanks !

Copy link
Member

@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work!

Copy link
Member

@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait.
One last point is missing.
Please add a ./CHANGES log record.

@kowalski
Copy link
Contributor Author

kowalski commented Mar 1, 2018

@asvetlov done (added ./CHANGES/2767.feature)

Copy link
Member

@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@asvetlov
Copy link
Member

asvetlov commented Mar 1, 2018

Great job, thanks to everyone.
Will merge when tests pass.

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bot:chronographer:provided There is a change note present in this PR outdated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants