Skip to content
This repository has been archived by the owner on Feb 26, 2024. It is now read-only.

Enhancement: Put events in txlog (txlog part 2) #4882

Merged
merged 14 commits into from
May 17, 2022
Merged

Enhancement: Put events in txlog (txlog part 2) #4882

merged 14 commits into from
May 17, 2022

Conversation

haltman-at
Copy link
Contributor

@haltman-at haltman-at commented Mar 17, 2022

Addresses #3824. (Well, the first part of it -- probably best to close the issue once this is released, and I'll make a second issue for the second part of it, i.e., the flattened log and the CLI command to show it.)

This PR adds events to the txlog. Specifically, it creates a new type of action, type "event". This new type of action does not itself have an actions subfield, note, because it doesn't represent any sort of call, that might have subactions; it simply represents the logging of an event, something that cannot have subactions.

In fact, this new type of action has only one subfield other than type, which is decoding, a LogDecoding as output by Truffle Codec. All txlog information about the event is found in here. (We don't explicitly record the event's address or anything like that, because, well, we have the entire tree structure of the txlog; that just isn't necessary.)

So, how does this work, then?

Well, first off, I added a decodeLog saga in the data sagas, similar to decodeReturnValue and other similar functions. This saga invokes Codec's decodeEvent functionality to decode an event being emitted on the current step. There are a few things worth noting here:

  1. The actual information to be decoded is passed in as part of state; I added new selectors data.current.state.eventdata and data.current.state.eventtopics. This is a bit implicit, but it's similar to how the other similar sagas work, and I put comments on it. These two selectors themselves are based on new selectors in the evm submodule. The one for getting the event data looks at the top two arguments on the stack and then extracts the data from memory; the one for getting the event topics gets the count of topics from the name of the current opcode (LOG0 has 0 topics, LOG1 has 1 topic, etc), and then gets them off of the stack.
  2. We pass the id option. Because we have the full power of the debugger here, we don't have to decode the event based only on the things the decoder would know (who emitted it and what it contains); instead we can look at where in the source code it was emitted. This gives us an ID (determined in the selector data.current.eventId, similar to the already-existing data.current.errorId), and we can pass this to Codec to tell it that we're looking for an event with that specific ID. If it successfully decodes the event as that, great! Only that decoding will be returned; others will be ignored. If however we can't determine the ID, or the ID doesn't work, then the ID passed in will be ignored and we'll get back a list of interpretations, of which there may be more than one, as usual. (Of course, if the event isn't ambiguous in the first place, then none of this is relevant!)
  3. We pass null for the address. This is a new capability I just added to Codec's decodeEvent function. See, normally decodeEvent will attempt to determine what context emitted the event based on the address passed in. But in our case, we already know what context emitted the event! So I made it so that if you pass null for the address, the context-determination logic will be bypassed, in favor of using the context passed in for currentContext.
  4. Speaking of which -- it turns out that data.views.contexts contained a nasty bug... it was indexing the contexts by contract ID instead of context hash! I presume that I did this for a reason, but nothing relies on this behavior, so I changed it. This never came up before because there's only one place that Codec ever directly indexes into contexts, and that's in decodeEvent, a function that the debugger never used previously!
  5. The other change to data.views.contexts, though, is that it now includes both constructor and deployed contexts. Events can be emitted from constructors, after all, so Codec had better be informed about those!

So, that's the new decodeLog saga. But, obviously, this saga does nothing by itself! The txlog saga has been updated to look for LOG instructions (there's an evm selector for that, obviously) and invoke decodeLog when this occurs, then put an event action in the txlog. This is mostly pretty straightforward and similar to other cases in txlog, but there are a few things worth noting:

  1. I renamed txlog.current.nextCallPointer to txlog.current.nextActionPointer, since the next action need not be a call any longer.
  2. You'll notice that newPointer is used, even though the txlog pointer is not updated; it's used to determine where to create the new event. This is basically how all new action types, that are not calls or returns and which do not have subactions, will work. Events are basically the first instance of the "generic case".
  3. You'll notice that if there are multiple possible decodings, we just grab the first, similarly to how we handle errors. I'm pretty sure we decided a while back that it's OK if txlog only gives one decoding for each event, similar to how it does errors? Of course, events aren't often ambiguous anyway, and when they are, the use of the id option will often disambiguate (although optimization may cause that to fail). I think this is probably fine, though?

I also added tests of the new functionality, of course.

But, OK, there's one more thing I need to discuss. And that's event allocation. Yes, the already complicated event allocation logic just got even more complicated. See, previously, events only ever needed to be associated to deployed contexts, but now we need to know about them for constructor contexts as well. Oh boy!

In order to accomplish this, I made three changes to getEventAllocations. (All changes were to that function, the outermost level; no changes were made to any of the functions it calls, such as getEventAllocationsForContract.) The first is that now, instead of it using deployed context hash as the primary key if it exists and compilation-ID-and-AST-ID otherwise, it now checks deployed context hash, then constructor context hash, and only then falls back on compilation ID and AST ID. (Also, it now only skips a contract if it lacks all three of those, rather than if it lacks a deployed context and an AST node.)

The second is that there's a new flag (turned off by default), allowConstructorEvents. Only the debugger passes this; the decoder doesn't.

The third change is that, if a contract has both a deployed context and a constructor context, I set up a "swap map" that associates the two. Then, at the end, when I assign event allocations to contexts, if the allowConstructorEvents flag is set, and the contract is not a library, I not only assign it to the context in the allocation, but also its paired one in the swap map.

Why the library restriction, and why the flag? Well, the thing is that library events are always considered regardless of context, so if we put it in there twice, it will appear twice in the decodings. We don't want that! There are potentially various ways to fix this, but I figured the simplest was to just turn this mechanism off for libraries. After all, library constructors don't emit events!

OK, but why the flag? Because even without libraries, we'd get the same problem if you decoded events with extras turned on! Again, there are various ways to approach this problem, but a flag seemed the simplest... the debugger can pass this flag, since it has no need for extras, while the decoder, which does need to handle the case of extras, can leave it turned off. Maybe a bit ugly, but it works.

Sorry, I realize all the event allocation logic is a bit confusing, but I'm hoping this all makes sense!

But with all that, well, we now have events in the txlog! Not in the CLI yet though, that'll be a separate PR. :)

While I was at it, btw, I made some other internal improvements:

  1. I reordered the functions in the data sagas; the current order was a mess and made it difficult to find anything.
  2. I had Codec.Export import and re-export unsafeNativizeVariables; there was no reason to exclude this earlier, I just forgot about it.
  3. I added a check to data.current.errorId, parallel to the check I put in data.current.eventId.

@haltman-at haltman-at changed the title Enhancements: Put events in txlog (txlog part 2) Enhancement: Put events in txlog (txlog part 2) Mar 17, 2022
@haltman-at haltman-at force-pushed the txlog2 branch 2 times, most recently from bb0b75f to cdeb6cd Compare April 7, 2022 03:21
@haltman-at
Copy link
Contributor Author

OK, so, this ended up getting a bit more complicated!

See, there's a problem: Event allocation needs to account for inheritance, and that means it needs to account for interfaces and abstract contracts, which don't have contexts.

This isn't a problem for the event allocation code in codec! It was written with that in mind, and the decoder handles this just fine. The problem was specifically with the debugger -- it had no way of keeping track of ABIs without contexts. So, I added one.

Now, instead of just contexts and sources, there's also contracts. The contracts state lives in data, since that's where it's used. (Even though it's in data, I have it active even in light mode, in order to keep the code simple... it's not like adding it in some big deal.)

The contracts state is indexed first by compilation ID and then by AST ID. It has compilation ID, AST ID, ABI, and the deployed and constructor context hashes. Notionally it could contain much more -- primary source index, contract name, contract kind, compiler, linearized base contracts,, primary language... but at @gnidan's request to avoid duplication I left all that out.

This new state is currently used only for determining allocation info; it's not used anywhere else. We may want to discuss in the future whether we want to potentially redivide the state in a better way.

I also added a test of decoding events inherited from interfaces.

@gnidan
Copy link
Contributor

gnidan commented May 4, 2022

  • We pass null for the address. This is a new capability I just added to Codec's decodeEvent function. See, normally decodeEvent will attempt to determine what context emitted the event based on the address passed in. But in our case, we already know what context emitted the event! So I made it so that if you pass null for the address, the context-determination logic will be bypassed, in favor of using the context passed in for currentContext.

This could have been a separate PR :)

  • Speaking of which -- it turns out that data.views.contexts contained a nasty bug... it was indexing the contexts by contract ID instead of context hash! I presume that I did this for a reason, but nothing relies on this behavior, so I changed it. This never came up before because there's only one place that Codec ever directly indexes into contexts, and that's in decodeEvent, a function that the debugger never used previously!

Similarly, bug fixes certainly should get to develop first. Esp. fixes for nasty bugs.

The second is that there's a new flag (turned off by default), allowConstructorEvents. Only the debugger passes this; the decoder doesn't.

Maybe same with this one? (I mention because large PRs waiting on review from me are often bottlenecks)

Anyway, one other initial comment:

The third change is that, if a contract has both a deployed context and a constructor context, I set up a "swap map" that associates the two. Then, at the end, when I assign event allocations to contexts, if the allowConstructorEvents flag is set, and the contract is not a library, I not only assign it to the context in the allocation, but also its paired one in the swap map.

Sorry if this should be self-evident, but could you explain how this swap map is used and why it's necessary?

Apart from that, your description makes sense, although I do fear for how complex some of this indexing of contracts/contexts/etc. is becoming. Is there a way we could unify this sort of lookup? Like, it seems to me that we might benefit from having a first-class lookup system that handles all of these things cohesively, rather than a bunch of "try the context? ok try the constructor context? ok try the contracts table? etc.". Won't hold up this PR for that, but maybe I'll at least encourage you to open an issue about this ;)

... stay tuned, reviewing implementation next. thanks!

Copy link
Contributor

@gnidan gnidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sound, implementation-wise! Thanks for this Harry. Have a few questions/comments before I hit approve.

packages/codec/lib/abi-data/allocate/index.ts Show resolved Hide resolved
packages/debugger/lib/data/sagas/index.js Show resolved Hide resolved
packages/debugger/lib/txlog/reducers.js Show resolved Hide resolved
@haltman-at
Copy link
Contributor Author

  • We pass null for the address. This is a new capability I just added to Codec's decodeEvent function. See, normally decodeEvent will attempt to determine what context emitted the event based on the address passed in. But in our case, we already know what context emitted the event! So I made it so that if you pass null for the address, the context-determination logic will be bypassed, in favor of using the context passed in for currentContext.

This could have been a separate PR :)

There's no reason for it by itself, though? We never needed such a thing until this PR.

  • Speaking of which -- it turns out that data.views.contexts contained a nasty bug... it was indexing the contexts by contract ID instead of context hash! I presume that I did this for a reason, but nothing relies on this behavior, so I changed it. This never came up before because there's only one place that Codec ever directly indexes into contexts, and that's in decodeEvent, a function that the debugger never used previously!

Similarly, bug fixes certainly should get to develop first. Esp. fixes for nasty bugs.

I mean, it's a bug that affected absolutely nothing until I wrote the rest of this.

Anyway, one other initial comment:

The third change is that, if a contract has both a deployed context and a constructor context, I set up a "swap map" that associates the two. Then, at the end, when I assign event allocations to contexts, if the allowConstructorEvents flag is set, and the contract is not a library, I not only assign it to the context in the allocation, but also its paired one in the swap map.

Sorry if this should be self-evident, but could you explain how this swap map is used and why it's necessary?

Sure, I can expand on that!

The idea is that our normal processing associates each event with a particular context (or perhaps no context, but we don't care about those ones). But, if an event is supposed to be associated with a particular contract, it should be associated with both its contexts, both constructor and deployed. The swap map associates each context with the other one for the same contract (assuming this exists). So when we add an event to a context, we use the swap map to also add it to the other context for the same contract, so it can be associated with both. (Aside from the cases where we don't use it, of course. :P ) Does that answer your question?

Apart from that, your description makes sense, although I do fear for how complex some of this indexing of contracts/contexts/etc. is becoming. Is there a way we could unify this sort of lookup? Like, it seems to me that we might benefit from having a first-class lookup system that handles all of these things cohesively, rather than a bunch of "try the context? ok try the constructor context? ok try the contracts table? etc.". Won't hold up this PR for that, but maybe I'll at least encourage you to open an issue about this ;)

I mean, you could open an issue about it. :P

Anyway yeah some way to unify all this would be nice but this mess of a system precisely because I don't see how to do that at the moment...

@haltman-at haltman-at requested a review from gnidan May 5, 2022 00:47
@haltman-at
Copy link
Contributor Author

Btw @gnidan one more question for you: For external function calls, the format doesn't precisely match a decoding you'd get back from the decoder (for good reason!), but here I just used a single decoding field. That's not very consistent, but it seemed the best way... that's not a problem, right?

@gnidan
Copy link
Contributor

gnidan commented May 5, 2022

Btw @gnidan one more question for you: For external function calls, the format doesn't precisely match a decoding you'd get back from the decoder (for good reason!), but here I just used a single decoding field. That's not very consistent, but it seemed the best way... that's not a problem, right?

Hm, that might be a problem. I like that we're coalescing around decoding objects being these things you can pass to inspectors, etc. The integration consistency has been nice the handful of times I've done it.

What are the good reasons for the differences?

@haltman-at
Copy link
Contributor Author

Hm, that might be a problem.

Well, if it is, it's a problem that already exists in txlog... probably this PR isn't the best place to address it.

I like that we're coalescing around decoding objects being these things you can pass to inspectors, etc. The integration consistency has been nice the handful of times I've done it.

So, using a decoding object like I did here is probably a good thing then, right? :)

What are the good reasons for the differences?

Hm, thinking it over some more, it might be possible to square things up more? Again, I think that's probably a matter for another PR if we want to do that. Offhand the hardest part to make consistent to my mind will be the "unwind" return type, which has no analogue in decoder.

@gnidan
Copy link
Contributor

gnidan commented May 17, 2022

The idea is that our normal processing associates each event with a particular context (or perhaps no context, but we don't care about those ones). But, if an event is supposed to be associated with a particular contract, it should be associated with both its contexts, both constructor and deployed. The swap map associates each context with the other one for the same contract (assuming this exists). So when we add an event to a context, we use the swap map to also add it to the other context for the same contract, so it can be associated with both. (Aside from the cases where we don't use it, of course. :P ) Does that answer your question?

Got it. Thank you!

Anyway yeah some way to unify all this would be nice but this mess of a system precisely because I don't see how to do that at the moment...

Yeah, alas. I'll make a note to open an issue.

Hm, that might be a problem.

Well, if it is, it's a problem that already exists in txlog... probably this PR isn't the best place to address it.

I like that we're coalescing around decoding objects being these things you can pass to inspectors, etc. The integration consistency has been nice the handful of times I've done it.

So, using a decoding object like I did here is probably a good thing then, right? :)

:)

What are the good reasons for the differences?

Hm, thinking it over some more, it might be possible to square things up more? Again, I think that's probably a matter for another PR if we want to do that. Offhand the hardest part to make consistent to my mind will be the "unwind" return type, which has no analogue in decoder.

Ah, I see. Is this problem far off from how there's the magic typeClass? That's part of the type unions returned by decoder, but it's never going to be produced.

Copy link
Contributor

@gnidan gnidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to have held this up for no reason! Thanks @haltman-at!

@haltman-at
Copy link
Contributor Author

Ah, I see. Is this problem far off from how there's the magic typeClass? That's part of the type unions returned by decoder, but it's never going to be produced.

Hm, I wouldn't really say that's similar? But, uh, I guess we can discuss this later...

@haltman-at haltman-at merged commit 4796c87 into develop May 17, 2022
@haltman-at haltman-at deleted the txlog2 branch May 17, 2022 21:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants