Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Increase network buffer sizes even more #6080

Merged
2 commits merged into from
Jun 18, 2020

Conversation

tomaka
Copy link
Contributor

@tomaka tomaka commented May 19, 2020

cc #6009

After #6064, it's pretty clear that the spikes in the network worker taking a long time are caused by the fact that the network worker is sleeping while waiting for a libp2p node task to empty the messages in its channel, and that during this sleep the events accumulate in the queues.

We still have spikes, so this PR increases these channel sizes even more, allowing for more parallelism between the network worker and the node tasks.

@tomaka tomaka added A2-insubstantial Pull request requires no code review (e.g., a sub-repository hash update). B0-silent Changes should not be mentioned in any release notes labels May 19, 2020
@arkpar
Copy link
Member

arkpar commented May 19, 2020

Could you give an estimate on how this will increase memory usage?

@tomaka
Copy link
Contributor Author

tomaka commented May 19, 2020

That's 16 more "worker to node" events per node task, plus 896 more "node to worker" events. Events are enums and all their fields are small, maybe around 32 bytes? So that'd be around 284 kiB more memory consumed.

@arkpar
Copy link
Member

arkpar commented May 19, 2020

So events do not contain actual network messages or any kinds of heap allocated buffers?

@tomaka
Copy link
Contributor Author

tomaka commented May 19, 2020

Well, the number that I gave is what will be the fixed memory overhead compared to today.

Events themselves contain heap-allocated messages, but it's impossible to estimate how that will impact memory usage.

The main point of this change is to prevent the network worker from sleeping, sleeps during which events accumulate. The grandpa messages, for instance, use an unbounded channel (#5481), and I expect to see this channel usage actually be consistantly lower.

@tomaka
Copy link
Contributor Author

tomaka commented May 19, 2020

To give an example, let's say GrandPa suddenly starts sending 25 messages to 50 different nodes (so 25 times 50 messages).

Before this PR, the network worker would queue 16 of these messages to the first node, then sleep. During this sleep, the (25 * 50 - 16) other messages would accumulate in the unbounded channel.
Then once the task of the first node starts processing its queue, the network worker will wake up, pull the next messages from the unbounded channel, enqueue these, go to sleep again, and so on.

After this PR, the intention is that the network worker would be able to continuously pull from the unbounded channel and queue all 25 messages to all 50 nodes without going to sleep. If, in parallel, nodes tasks wake up and drain the queue, the peak of memory will actually be lower than before.

The example number of 25 messages is totally crafted, but as long as these spikes show up in the graphs, that means that the channel sizes are not big enough.

@arkpar
Copy link
Member

arkpar commented May 19, 2020

The main point of this change is to prevent the network worker from sleeping, sleeps during which events accumulate.

Why would the network worker "sleep" and for how long?

During this sleep, the (25 * 50 - 16) other messages would accumulate in the unbounded channel.

What unbounded channel are you referring too? I thought the gossip queue is bound and network behaviour skips sending messages if it grows too large.

As far as I understand this is a workaround for the fact that libp2p hash a single choke point. An event stream for a single connection blocks all of the network worker and effectively other connections if it grows too large. Would it be possible to fix that fundamental design flaw instead? E.g. by not dispatching everything through the network worker/swarm, but pushing these events directly to node tasks?

@arkpar arkpar removed the A2-insubstantial Pull request requires no code review (e.g., a sub-repository hash update). label May 20, 2020
@arkpar
Copy link
Member

arkpar commented May 20, 2020

Also I've just noticed that gossip message batching that I've added in #4055 has been silently removed by a later refactoring. I think that was a much more efficient way to reduce the number of network messages and consequently queued events.
Can we bring it back instead?

@tomaka
Copy link
Contributor Author

tomaka commented May 20, 2020

Why would the network worker "sleep" and for how long?

It is sleeping because the queue of messages is full, and the only alternative to sleeping would be to not put a limit to the size of the channel, which is notoriously a bad thing (#5106).

What unbounded channel are you referring too? I thought the gossip queue is bound and network behaviour skips sending messages if it grows too large.

This channel: #5481

/// Channel that sends messages to the actual worker.
to_worker: TracingUnboundedSender<ServiceToWorkerMsg<B, H>>,

The channel between the gossip machine and the network is unbounded, and making it bounded would likely require an epic refactoring.
It is unfortunately a logic error to drop messages, and the skipping of messages that you're talking about is a hack in order to not have a memory leak. But in order to function correctly, GrandPa at the moment expects no message to ever be dropped.

Would it be possible to fix that fundamental design flaw instead? E.g. by not dispatching everything through the network worker/swarm, but pushing these events directly to node tasks?

While sending messages directly to the node task could be a positive optimization, having a single choke point is done by design. The single choke point is how we avoid getting completely crazy with having different parts of the code having different views of the state of a connection.

Considering that GrandPa is single-threaded anyway (it is a "choke point" by itself), and provided that buffers are large enough (which is what this PR is for), the only drawback of that choke point right now is some added latency when delivering messages (which should be in the order of a few microseconds if things work well).

@arkpar
Copy link
Member

arkpar commented May 20, 2020

I don't see how this has anything to do with the state of the connection. The problem is this weird polling model adopted all over the networking code. Let's say a protocol behaviour needs to send 10 messages for each of the 10 peers we are connected to. So the current design of libp2p forces it to be implemented in the following way:

fn send_a_bunch_of_messages(&self, messages: Vec<Message>) {
   self.some_queue.extend(messages)
}

fn poll(&self) -> MaybeMessage {
   // return a single message here
    return self.some_queue.pop();
}

Later, libp2p calls poll in a loop and distributes messages to handlers. Except if one of them is full it simply stops and all others have to wait. That's what I mean by the choke point. It has nothing to do with grandpa being single-threaded, as it is a design issue with libp2p itself.
This leads to a lot of queues, such as some_queue all over the codebase that are hard to reason about and follow.

I would propose that instead of returning messages that need to be send out from poll one by one, there would be an object that dispatches messages internally to handlers right away, without the poll loop.

fn send_a_bunch_of_messages(&self, messages: Vec<Message>) {
   for m in messages {
      if self.swarm.try_send_to(m.peer, m.data) == Full {
        // handle individual peer connection being to slow here
      }
   }
}

This does not introduce any new state, but simply changes how messages are dispatched to node tasks, does it?

@tomaka
Copy link
Contributor Author

tomaka commented May 20, 2020

I don't see how this has anything to do with the state of the connection.

The network worker (which includes the libp2p Swarm) exists because it is the single-threaded authority which decides which connections are opening, open, used, not used, and so on. That's the reason why all the messages go through it, as its job is to relay these messages to the right connection.

This by itself doesn't mean that we can't bypass that blocking behaviour, but it is definitely not unrelated.

Later, libp2p calls poll in a loop and distributes messages to handlers. Except if one of them is full it simply stops and all others have to wait. That's what I mean by the choke point. It has nothing to do with grandpa being single-threaded, as it is a design issue with libp2p itself.

I mention GrandPa being single-threaded, because how it works now is:

Node <-\
Node <-|
Node <--> Network Worker <-> GrandPa
Node <-|
Node <-/

If GrandPa was multithreaded, yes the choke point would be very negative. At the moment, the only drawback of the choke point is the increased latency.

I would propose that instead of returning messages that need to be send out from poll one by one, there would be an object that dispatches messages internally to handlers right away, without the poll loop.

That would indeed make the code more readable, but it is just moving the problem and not solving it.
You can't just call send_message() an arbitrary number of times and expect the code to:

  • Deliver all messages successfully.
  • Deliver all messages without sleeping on the sending side.
  • Have a cap on the memory usage.

Having all three constraints at the same time is physically impossible.

// handle individual peer connection being to slow here

That is exactly the root of the problem. How do you handle an individual node task being slow? Putting a comment won't magically solve that problem.
As a reminder, this has nothing to do with the connection to the peer itself being slow. Node tasks have to accept all events unconditionally and as soon as possible, and when events are accepted by the node task is purely a matter of when the tokio scheduler will schedule the task.

The point of the buffer size increase that this PR does is to not reach a full buffer while tokio schedules the task.

@tomaka
Copy link
Contributor Author

tomaka commented May 20, 2020

To give a comparison: TCP connections use a window system for proper back-pressure.
If the window size increase was too low, then you could only transfer a few hundred bytes before waiting for an ACK, and TCP connections in general would be very slow.
But you solve that by increasing the window size until proper speed is achieved, and not by removing congestion control altogether.
The same applies here, but for messages rather than packets of data.

@arkpar
Copy link
Member

arkpar commented May 20, 2020

As a reminder, this has nothing to do with the connection to the peer itself being slow. Node tasks have to accept all events unconditionally and as soon as possible

Why? That does not make any sense. If The peer connection is only 1kbps and we arer trying to send messages at the rate of 1mbps the outbound backpressure needs to be propagated to the code that actually tries to send the message. Because this is where we know how to handle the slow connection. In case of grandpa this is where it may decide what to do with a slow peer. E.g. Only send important round messages, skip a round or simply drop the connection.

The approach that I've suggested gives the protocol behaviour the flexibility to handle outbound connection backpressure and prevents the swarm::poll blocking on a single task.

@arkpar
Copy link
Member

arkpar commented May 20, 2020

To give a comparison: TCP connections use a window system for proper back-pressure.
If the window size increase was too low, then you could only transfer a few hundred bytes before waiting for an ACK, and TCP connections in general would be very slow.
But you solve that by increasing the window size until proper speed is achieved, and not by removing congestion control altogether.
The same applies here, but for messages rather than packets of data.

This is a false analogy. TCP connections do not block on each other. Libp2p task handlers do. If network behaviour generates a lot of events for connection 1, connection 2 (and all of the networking) will be starved.

@tomaka
Copy link
Contributor Author

tomaka commented May 20, 2020

Why? That does not make any sense.

It is the node task that must decide what to do with a flow of events that is too high.
The events are not opaque to the node task, in other words it knows which events are sync, which are grandpa, and so on, and can take the appropriate action.

In case of grandpa this is where it may decide what to do with a slow peer. E.g. Only send important round messages, skip a round or simply drop the connection.

That's what I mentioned above: GrandPa is not ready to deal with this, at all. Fixing it would require an epic ten-thousand-lines-of-code refactoring that we clearly can't do, and we have to deal with this constraint.

As I mentioned, we could in theory totally do some changes in the networking code to make GrandPa directly send events to node tasks, but for the foreseeable future we will have this constraint that GrandPa can't communicate with the network without introducing unbounded channels, and doing this change won't bring anything more than what this PR does.

@arkpar
Copy link
Member

arkpar commented May 20, 2020

It is the node task that must decide what to do with a flow of events that is too high.
The events are not opaque to the node task, in other words it knows which events are sync, which are grandpa, and so on, and can take the appropriate action.

Does not really explain why it has to be this way. To make it makes much more sense to handle this in the code that actually decides what to send and to whom.

That's what I mentioned above: GrandPa is not ready to deal with this, at all. Fixing it would require an epic ten-thousand-lines-of-code refactoring that we clearly can't do, and we have to deal with this constraint.

Not convinced this is really that complicated. @mxinden @romanb @twittner Would be good to get your opinion on this.

@tomaka
Copy link
Contributor Author

tomaka commented May 20, 2020

I'm going to try to summarize:

  • GrandPa sends an unbounded number of messages to the network worker destined to various nodes.
  • The network worker (which is normally supposed to be lightweight) is responsible for dispatching these messages to the correct task.
  • The tasks discard GrandPa messages if there are too many, which is a violation of the expectations of GrandPa, but is the pragmatic option right now.

The network worker is not supposed to be "back-pressured" (i.e. sleep) itself because one of the tasks is slow. It normally only exists to do the relay between GrandPa and the tasks. That's why node tasks have to accept all events unconditionally.

While we could make GrandPa bypass the network worker altogether, it doesn't solve the problem of GrandPa sending an unbounded number of messages and the task potentially discarding them.

Supposing that GrandPa becomes ready to handle back-pressure, we would indeed have to modify the network worker and introduce new mechanisms in order to make it work. But in the meanwhile, I believe that introducing such mechanism wouldn't bring anything more than what this PR does.

@arkpar
Copy link
Member

arkpar commented May 20, 2020

But in the meanwhile, I believe that introducing such mechanism wouldn't bring anything more than what this PR does.

I can see the following benefits:

  1. Get rid of "blocking on a single task" behaviour

  2. Remove notification queues all over the networking code. They've already proven to be a source of numerous hard to spot issues.

  3. Make handling backpressure more straightforward.

  4. Make the code more simple and easier to follow.

The way I see it can be done in 3 steps

  1. Release libp2p version that allow sending events directly through swarm reference (or an object that encapsulates it). Sending is allowed to fail if an internal bounded event queue for the node task is full.

  2. Gradually use it in substrate. Not only in gossip, but all protocols.

  3. Eventually drop NetworkBehaviourAction

@tomaka
Copy link
Contributor Author

tomaka commented May 20, 2020

Release libp2p version that allow sending events directly through swarm reference (or an object that encapsulates it).

I do believe that the way libp2p works is fundamentally correct. Your proposal can be implemented by adding a channel within Substrate and does not require changes in libp2p.

I also in general don't think that it's a good idea to even consider refactoring half of libp2p and Substrate as an alternative to a two lines of code PR.

Remove notification queues all over the networking code. They've already proven to be a source of numerous hard to spot issues.

I'm only aware of one such hard to spot issue, and it is caused by the sync state machine working differently than the rest of the networking state machine.

@twittner
Copy link
Contributor

That's what I mentioned above: GrandPa is not ready to deal with this, at all. Fixing it would require an epic ten-thousand-lines-of-code refactoring that we clearly can't do, and we have to deal with this constraint.

Not convinced this is really that complicated. @mxinden @romanb @twittner Would be good to get your opinion on this.

If we replace the unbounded channel with a bounded one in substrate's NetworkWorker, then finality-grandpa must be able to handle the back-pressure. See https://github.com/twittner/substrate/tree/issue-5481 for an initial attempt to replace the channel.

How do you handle an individual node task being slow? Putting a comment won't magically solve that problem. As a reminder, this has nothing to do with the connection to the peer itself being slow. Node tasks have to accept all events unconditionally and as soon as possible, and when events are accepted by the node task is purely a matter of when the tokio scheduler will schedule the task.

[...]

It is the node task that must decide what to do with a flow of events that is too high.
The events are not opaque to the node task, in other words it knows which events are sync, which are grandpa, and so on, and can take the appropriate action.

When you say "node task", what are you referring to exactly? Surely it can not be libp2p's task.rs.

@romanb
Copy link
Contributor

romanb commented May 20, 2020

Please treat all my comments below with a grain of salt, as I may not have the full picture of all the details that are being discussed here.

After #6064, it's pretty clear that the spikes in the network worker taking a long time are caused by the fact that the network worker is sleeping while waiting for a libp2p node task to empty the messages in its channel, and that during this sleep the events accumulate in the queues.

This is due to the single pending_event in the Swarm, right? And increasing the buffer sizes in libp2p-core is basically a countermeasure to reduce the likelihood of the pending_event in the Swarm not being able to be delivered? So this is the "outgoing" direction, e.g. for sending something to a peer (via the behaviour emitting an event). And since the Swarm drives the underlying Network and NetworkBehaviour in tandem, a single connection being slow to send stuff even results in all connection handlers receiving events slowly?

I'd just like to differentiate the problem, because I don't currently think that the buffers themselves in libp2p-core are a problem, since using libp2p-core directly, you can send events to specific connections directly and so one connection's background task being slow to make progress does not need to stop the user from sending events to other connections. In other words, this seems to be a problem of libp2p-swarm. I would tend to agree that the current design of libp2p-swarm in such a way introduces what seems to be an artificial bottleneck, and as I can see it caused primarily by the inability of a Swarm to communicate back-pressure between itself and the NetworkBehaviour, e.g. when the behaviour emits an event for a connection, the Swarm must deliver or buffer it, but cannot "give it back" to the behaviour if the connection is busy.

As far as this PR is concerned and if my understanding of the problem(s) is correct, increasing the buffer sizes seems like a short-term stop-gap measure that may be fine for now, but certainly not a desirable long-term solution, but as stated initially, I may not have the full picture. I opened libp2p/rust-libp2p#1585 for some related discussion.

@tomaka
Copy link
Contributor Author

tomaka commented May 20, 2020

When you say "node task", what are you referring to exactly? Surely it can not be libp2p's task.rs.

I'm indeed referring to the tasks spawned in task.rs. By "node task", I'm also including the code in sc_network/src/protocol/generic_proto/handler.rs, which is where we know which event corresponds to what.

This is due to the single pending_event in the Swarm, right? And increasing the buffer sizes in libp2p-core is basically a countermeasure to reduce the likelihood of the pending_event in the Swarm not being able to be delivered? So this is the "outgoing" direction, e.g. for sending something to a peer (via the behaviour emitting an event). And since the Swarm drives the underlying Network and NetworkBehaviour in tandem, a single connection being slow to send stuff even results in all connection handlers receiving events slowly?

Correct.

I would tend to agree that the current design of libp2p-swarm in such a way introduces what seems to be an artificial bottleneck, and as I can see it caused primarily by the inability of a Swarm to communicate back-pressure between itself and the NetworkBehaviour, e.g. when the behaviour emits an event for a connection, the Swarm must deliver or buffer it, but cannot "give it back" to the behaviour if the connection is busy.

The reason why ProtocolsHandler::inject_event is a synchronous method is that we want to make it mandatory for a ProtocolsHandler to immediately accept events that are being sent to it.
If the channel that transmits messages to the node task is full, the Swarm only has to sleep until the executor (e.g. tokio) schedules the node task.

(the exception to that is if there's some CPU intensive action in a node task, or if it is stuck in an infinite loop, which thankfully doesn't happen, but we should eventually have some protection measure here)

It is also pretty important for the ProtocolsHandler to immediately accept all events to prevent a potential deadlock where the node task is blocked trying to deliver an event to the Swarm while the Swarm is blocked trying to deliver an event to that node task.

While giving it back to the NetworkBehaviour could be a solution, it moves the complexity from the Swarm to the NetworkBehaviour.

@romanb
Copy link
Contributor

romanb commented May 20, 2020

The reason why ProtocolsHandler::inject_event is a synchronous method [..]

I don't actually think this is part of the problem, is it? As these calls are scoped to a particular connection (and background task). As far as I currently understand, the problem is rather that NetworkBehaviour::poll and NetworkBehaviour::inject_event do not permit communicating (connection-specific) busyness / back-pressure between the network and behaviour via the swarm, and so essentially the Swarm can only progress as fast as the slowest connection (+ leeway due to buffers) if it does not wish to offload the problem of back-pressure onto the behaviour.

@tomaka
Copy link
Contributor Author

tomaka commented May 20, 2020

I don't actually think this is part of the problem, is it?

I do think it is important. The fact that ProtocolsHandler::inject_event is synchronous means that there is a finite time before the Swarm will be able to wake up.

and so essentially the Swarm can only progress as fast as the slowest connection (+ leeway due to buffers) if it does not wish to offload the problem of back-pressure onto the behaviour.

Yes, but an important point is that it is not the slowest connection, as this word gives the impression that this is dependant on the bandwidth of the connection, which is not the case at all. It is rather based on the CPU usage of the node task. Node tasks are supposed to be idle 99% of the time.
If the Swarm goes to sleep, it is only to wait until tokio (or whatever executor we use) schedules the relevant node task.

@arkpar
Copy link
Member

arkpar commented May 20, 2020

Yes, but an important point is that it is not the slowest connection, as this word gives the impression that this is dependant on the bandwidth of the connection, which is not the case at all. It is rather based on the CPU usage of the node task.

Where is send called on the TCP socket? Does this happen in the node task? And if so, is the socket open in blocking mode?

Regardless, the only reason to have a queue in the task handler is for packets that wait for their turn to be sent, isn't it? What other possible causes for anything to be queued there? All other types of events could simply be processed synchronously.

@gavofyork gavofyork added the A0-please_review Pull request needs code review. label May 20, 2020
@tomaka
Copy link
Contributor Author

tomaka commented May 20, 2020

Where is send called on the TCP socket? Does this happen in the node task? And if so, is the socket open in blocking mode?

That's indeed in the node task, and done by mio, the non-blocking sockets library.

@arkpar
Copy link
Member

arkpar commented May 20, 2020

Where is send called on the TCP socket? Does this happen in the node task? And if so, is the socket open in blocking mode?

That's indeed in the node task, and done by mio, the non-blocking sockets library.

It's non-blocking for receiving, sure. But I could not confirm that writes are non-blocking too. Anyway, the queue is the node handler grows when the socket can't send messages fast enough, doesn't it? Or are there any other reasons for a handler not to accept an event?

@arkpar
Copy link
Member

arkpar commented Jun 5, 2020

Let's test this on a sentry node with ~150 connections for a couple of days and see if the memory usage increase is significant

@gnunicorn gnunicorn added the I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. label Jun 9, 2020
@gavofyork gavofyork removed the I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. label Jun 10, 2020
@tomaka tomaka added the C1-low PR touches the given topic and has a low impact on builders. label Jun 18, 2020
@tomaka
Copy link
Contributor Author

tomaka commented Jun 18, 2020

This PR got deployed on a sentry node around midnight.

The CPU and memory usage look the same before and after the deployment (they vary a lot over time under normal circumstances, so any difference before and after this PR is completely lost in the background noise).

Here is the CPU load of the network worker specifically, in blue:
One can see that indeed the spikes are lower.

Screenshot from 2020-06-18 10-06-46

These are the messages that the GrandPa messages processing task receives:

Screenshot from 2020-06-18 10-10-50

Again, it's smoother. The spikes have disappeared.

To summarize the situation, I totally agree that we should make notifications go directly to the background task (and I'm slowly working towards this), but I do believe that this PR improves the situation at basically no cost.

Most likely the additional memory consumption is compensated to some degree by the fact that the unbounded channels, for example the one towards GrandPa, consume less memory in the absence of spikes.

@gnunicorn
Copy link
Contributor

bot merge

@ghost
Copy link

ghost commented Jun 18, 2020

Trying merge.

@ghost ghost merged commit 44978b9 into paritytech:master Jun 18, 2020
@tomaka tomaka deleted the increase-buffers-moar branch June 18, 2020 12:01
drewstone added a commit to hicommonwealth/substrate that referenced this pull request Jun 23, 2020
* Fix typo: eror -> error (paritytech#6293)

* Fix typo: PRORITY -> PRIORITY (paritytech#6291)

* Intent to release rc3 (paritytech#6290)

* Fix transaction pool & network issues (paritytech#6288)

* fix & tweaks

* address review

* line width

* Use `sign_with` for signing grandpa's outgoing message (paritytech#6178)

* Use sign_with and stop using `Pair`

* PR feedback

* Remove clone

* Transfer ownership of public to sign_message

* Use Option

* Simplify code

* Fix error message

* Pass keystore as ref

* Pass keystore properly

* Fix tests

* Revalidation tweak & logging for transaction pool (paritytech#6258)

* updates and logging

* fix length

* Update client/transaction-pool/src/lib.rs

* rename

* Update client/transaction-pool/src/lib.rs

Co-authored-by: Bastian Köcher <[email protected]>

Co-authored-by: Bastian Köcher <[email protected]>

* Update README.md

* Allow adding a prefix to the informant (paritytech#6174)

* Initial commit

Forked at: 49b1561
Parent branch: origin/master

* Add a Service Configuration's field + adapt informant + provide means to CLI

* CLEANUP

Forked at: 49b1561
Parent branch: origin/master

* fix tests

* fixed bad path to object

* Change OutputFormat enum to struct

* Add informant_prefix to builder and service

* Revert "Change OutputFormat enum to struct"

This reverts commit cd86c58.

* Revert "fix tests"

This reverts commit a3c306e.

* Revert "Add a Service Configuration's field + adapt informant + provide means to CLI"

This reverts commit 9c2e726.

* Implementation using the ServiceBuilder

* reduce line length

* fix line width again

* WIP

Forked at: 49b1561
Parent branch: origin/master

* WIP

Forked at: 49b1561
Parent branch: origin/master

* WIP

Forked at: 49b1561
Parent branch: origin/master

* use struct instead of enum

* WIP

Forked at: 49b1561
Parent branch: origin/master

* Update client/service/src/lib.rs

Co-authored-by: Bastian Köcher <[email protected]>

* improve doc

* Update client/service/src/builder.rs

Co-authored-by: Bastian Köcher <[email protected]>

* Update client/service/src/builder.rs

Co-authored-by: Bastian Köcher <[email protected]>

* change code

* Update client/informant/src/lib.rs

Co-authored-by: Bastian Köcher <[email protected]>

* enable_color

* reorg log

* remove macro

* Removed builder for informant prefix

* fix doc

* Update client/informant/src/lib.rs

Co-authored-by: Bastian Köcher <[email protected]>

* Update client/informant/src/lib.rs

Co-authored-by: Bastian Köcher <[email protected]>

* Update client/informant/src/lib.rs

Co-authored-by: Bastian Köcher <[email protected]>

* Update client/informant/src/lib.rs

Co-authored-by: Bastian Köcher <[email protected]>

* Update client/service/src/builder.rs

Co-authored-by: Bastian Köcher <[email protected]>

* Update client/service/src/builder.rs

Co-authored-by: Bastian Köcher <[email protected]>

* Update client/service/src/builder.rs

Co-authored-by: Bastian Köcher <[email protected]>

Co-authored-by: Bastian Köcher <[email protected]>

* Transaction pool added missed comment (paritytech#6308)

* Add a test for lots of nodes connecting at the same time (paritytech#6247)

* Add a test for lots of nodes connecting at the same time

* Do small change

* Introduce frozen indices. (paritytech#6307)

* Introduce frozen indices.

* Fix.

* Bump runtime

* Benchmark for freeze

* Fix

* fix benchmarks

* update freeze weights

* remove copy pasta

Co-authored-by: Shawn Tabrizi <[email protected]>

* new crate sc-light (paritytech#6235)

* sc-light

* remove unused deps

* fix line width

* move more fns to sc_light

* Fix ui tests for latest rust stable (paritytech#6310)

* Expose light client. (paritytech#6313)

* Fix nits in rpc error display. (paritytech#6302)

* Improve rpc error display.

* Apply review suggestion.

* Apply review suggestion.

* Update client/rpc-api/src/author/error.rs

* Fix custom.

Co-authored-by: Bastian Köcher <[email protected]>

* "OR gate" for EnsureOrigin (paritytech#6237)

* 'OR gate' for EnsureOrigin.

* Formatting.

* More formatting.

* Add docstring; Update 'Success' type.

* Bump runtime impl_version.

* Fix successful_origin.

* Add either into std feature list.

* Update docs.

* New CI image (paritytech#6223)

* fix (ci): hotfix Docker release

* change (ci): moving to the tested CI image with a proper name

* change (ci): rename substrate-ci-linux

* Reduce the lots_of_incoming_peers_works test load (paritytech#6314)

* change (ci): moving to the tested CI image with a proper name

* change (ci): rename substrate-ci-linux

* Reduce the lots_of_incoming_peers_works test load (paritytech#6314)

Co-authored-by: Bastian Köcher <[email protected]>
Co-authored-by: Pierre Krieger <[email protected]>

* Add a feature to create automatically a random temporary directory for base path & remove `Clone` (paritytech#6221)

* Initial commit

Forked at: 342caad
Parent branch: origin/master

* Add a feature to create automatically a temporary directory for base path

* doc fix and todos

* use parking_lot instead

* use refcell instead since we stay in the main thread

* remove Clone derives

* add test

* solving dependency issue

* clarifying doc

* conflict argument with base-path

* WIP

Forked at: 342caad
Parent branch: origin/master

* revert dep deletion

* fixing test and making base_path optional

* hold basepath while the service is running

* fixes

* Update client/cli/src/params/shared_params.rs

Co-authored-by: Bastian Köcher <[email protected]>

* Update client/service/Cargo.toml

Co-authored-by: Bastian Köcher <[email protected]>

* Update client/cli/src/commands/mod.rs

Co-authored-by: Bastian Köcher <[email protected]>

* Update client/service/src/config.rs

Co-authored-by: Bastian Köcher <[email protected]>

* WIP

Forked at: 342caad
Parent branch: origin/master

* improve doc

Co-authored-by: Bastian Köcher <[email protected]>

* Add a [prefix]_process_start_time_seconds metric (paritytech#6315)

* Make NumberOrHex a common primitive. (paritytech#6321)

* Make NumberOrHex a common primitive.

* Update primitives/rpc/src/number.rs

Co-authored-by: Nikolay Volf <[email protected]>

Co-authored-by: Nikolay Volf <[email protected]>

* Avoid self-lookups in Authority Discovery (paritytech#6317)

* Ensure authority discovery avoids self-lookups.

Thereby additionally guard the `NetworkService` against
adding the local peer to the PSM or registering a
"known address" for the local peer.

* Clarify comments.

* See if returning errors is ok.

* Fix quadratic iterations in transaction pool ready set (paritytech#6256)

* refactor ready set size calc

* Update client/transaction-pool/graph/src/ready.rs

Co-authored-by: Bastian Köcher <[email protected]>

* remove pub

* update to new variat

* rename

Co-authored-by: Bastian Köcher <[email protected]>

* Find the alive incoming entry on disconnect. (paritytech#6320)

When a peer in `Incoming` state disconnects, the "alive" entry
in the `incoming` list for that peer must be updated (set to `false`).
Currently the entry that is updated may be an earlier entry for the
same peer that is already no longer alive. This can happen if a
peer repeatedly connects (incoming) and disconnects between invocations to
`poll()` of the behaviour.

* Impl Debug and Display for Ss58AddressFormat when compiled with std (paritytech#6327)

* Initial commit

Forked at: 606c56d
Parent branch: origin/master

* Impl Debug and Display for Ss58AddressFormat when compiled with std

Fixes paritytech#6289

* Use write! instead of writeln!

* transaction-pool: expose blocking api for tx submission (paritytech#6325)

* transaction-pool: expose blocking api for tx submission

* service: separate ServiceBuilder::build for full and light

* service: add ServiceBuilder::build_common

* transaction-pool: extend docs

Co-authored-by: Tomasz Drwięga <[email protected]>

Co-authored-by: Tomasz Drwięga <[email protected]>

* Pruned and resubmitted metrics in transaction pool (paritytech#6322)

* pruned and resubmitted metrics

* update counter once

* Enable wasmtime on node-template (paritytech#6336)

* Enable wasmtime on node-template

* Apply suggestions from code review

syntax

Co-authored-by: Nikolay Volf <[email protected]>

Co-authored-by: Nikolay Volf <[email protected]>

* Adds support for storage parameter types (paritytech#6296)

* Adds support for storage parameter types

This pr adds a new parameter types type, the storage parameter types.
This parameter type supports loading the value from the storage or
returning the given default value.

* Use twox_128

* Update docs

* Update frame/support/src/lib.rs

Co-authored-by: Alexander Popiak <[email protected]>

Co-authored-by: Alexander Popiak <[email protected]>

* Basic documentation for Scheduler pallet (paritytech#6338)

Closes paritytech#5912

* Fix check-line-width CI script (paritytech#6326)

* Compare lines to the hash that the PR branched off from

* Use git merge-base to determine common ancestor

* Fixup

* client: use appropriate ExecutionContext for initial sync / regular import (paritytech#6180)

* client: use appropriate ExecutionContext for sync/import

* client: remove dead code

* client: ExecutionContext: distinguish between own and foreign imports

* client: fix cli parameter doc

* Revert "client: ExecutionContext: distinguish between own and foreign imports"

This reverts commit 0fac115.

* primitives: add docs for ExecutionContext

* cli: execution strategy docs

* cli: use different execution context for importing block on validator

* cli: remove defaults from execution context flags

* Fix transaction pool event sending (paritytech#6341)

This pr fixes a bug with the transaction pool not sending certain events
like finalized and also fixes the order of events. The problem with the
finalized event was that we did not extracted pruned extrinsics if there
were not ready transactions in the pool. However this is wrong, if we
have a re-org, a tx is clearly not ready anymore and we still need to
send a pruned event for it because it is in a new block included. This
also lead to sending "ready" events and tx being re-validated. The
listener also only send the "finalized" event if it has seen a block as
being included, which did not happen before with the old code.

The second fix of the pr is the order of events. If we prune and retract the
same transaction in the same block, we first need to send the "retract"
event and after that the "pruned" event, because finalization takes
longer and this would lead to the UI showing "retract" while it actually
is included.

* Deprecate FunctionOf and remove its users (paritytech#6340)

* Deprecate FunctionOf and remove users

* Remove unused import

* Add events for balance reserve and unreserve functions (paritytech#6330)

* almost works

* add clone to BalanceStatus

* reserve event

* fix staking tests

* fix balances tests

* Update frame/balances/src/tests.rs

Co-authored-by: Kian Paimani <[email protected]>

* restore tests and move event emission

* move repatriate reserved event outside of mutate_account

* clean up events in tests

Co-authored-by: Kian Paimani <[email protected]>

* Update contributing guide with new label policy (paritytech#6333)

* mention C and M labels in contributing guide

* update PR template with more specific instructions

* update PR template with updated label rules and contributing guide link

* update contibuting guide

* adding a ss58 format for Stafi Network (paritytech#6347)

* add extend_lock for StorageLock (paritytech#6323)

* add extend_lock for StorageLock

* changes

* changes

* Introduce in-origin filtering (paritytech#6318)

* impl filter in origin

* remove IsCallable usage. Breaking: utility::batch(root, calls) no longer bypass BasicCallFilter

* rename BasicCallFilter -> BaseCallFilter

* refactor code

* Apply suggestions from code review

Co-authored-by: Kian Paimani <[email protected]>

* remove forgotten temporar comment

* better add suggestion in another PR

* refactor: use Clone instead of mem::replace

* fix tests

* fix tests

* fix tests

* fix benchmarks

* Make root bypass filter in utility::batch

* fix unused imports

Co-authored-by: Kian Paimani <[email protected]>

* pallet-evm add get(fn) to AccountStorages (paritytech#6279)

* Add IPC support (paritytech#6348)

This is useful for both security and performance reasons. IPC is faster
than TCP, and it is subject to OS access controls.

* expose constants of pallet_recovery trait (paritytech#6363)

* Impl integrity test for runtime (paritytech#6356)

* impl integrity test for runtime

* Update frame/support/src/traits.rs

Co-authored-by: Bastian Köcher <[email protected]>

* Update frame/support/procedural/src/construct_runtime/mod.rs

Co-authored-by: Bastian Köcher <[email protected]>

* use thread local

* update doc

* Apply suggestions from code review

Co-authored-by: Bastian Köcher <[email protected]>
Co-authored-by: Gavin Wood <[email protected]>

* historical slashing w ocw w adhoc tree creation (paritytech#6220)

* draft

* steps

* chore: fmt

* step by step

* more details

* make test public

* refactor: split into on and offchain

* test stab

* tabs my friend

* offchain overlay: split key into prefix and true key

Simplifies inspection and makes key actually unique.

* test: share state

* fix & test

* docs improv

* address review comments

* cleanup test chore

* refactor, abbrev link text

* chore: linewidth

* fix prefix key split fallout

* minor fallout

* minor changes

* addresses review comments

* rename historical.rs -> historical/mod.rs

* avoid shared::* wildcard import

* fix: add missing call to store_session_validator_set_to_offchain

* fix/compile: missing shared:: prefix

* fix/test: flow

* fix/review: Apply suggestions from code review

Co-authored-by: Tomasz Drwięga <[email protected]>

* fix/review: more review comment fixes

* fix/review: make ValidatorSet private

* fix/include: core -> sp_core

* fix/review: fallout

* fix/visbility: make them public API

Ref paritytech#6358

* fix/review: review changes fallout - again

Co-authored-by: Bernhard Schuster <[email protected]>
Co-authored-by: Tomasz Drwięga <[email protected]>

* [CI] Auto-label new PRs according to draft status (paritytech#6361)

* add auto-label github action

* Add missing 'remove-labels' line

* Split the service initialisation up into seperate functions (paritytech#6332)

* Seperate out the complexity in ServiceBuilder::build_common into seperate functions

* Fix line widths

* Move some functions to their respective crates

* [CI] Add label enforcement (paritytech#6365)

* Add label enforcement

* fix .gitlab-ci.yml

* update check_labels.sh

* vesting: Force Vested Transfer (paritytech#6368)

* force-vested-transfer

* Tweak weights

* Update frame/vesting/src/lib.rs

Co-authored-by: joe petrowski <[email protected]>

Co-authored-by: joe petrowski <[email protected]>

* client/authority-discovery: Don't add own address to priority group (paritytech#6370)

* client/authority-discovery: Don't add own address to priority group

In the scenario of a validator publishing the address of its sentry node
to the DHT, said sentry node should not add its own Multiaddr to the
peerset "authority" priority group.

Related to 70cfeff.

* client/authority-discovery: Remove unused import PeerId

* client/authority-discovery/tests: Add tcp protocol to multiaddresses

* .gitlab-ci.yml: Run promtool on Prometheus alerting rules (paritytech#6344)

* .gitlab-ci.yml: Run promtool on Prometheus alerting rules

Add a CI stage to test the Prometheus alerting rules within
`.maintain/monitoring`.

* .gitlab-ci.yml: Switch Prometheus stage to paritytech/tools image

* .gitlab-ci.yml: Follow http redirects in Prometheus stage

* .gitlab-ci.yml: Fix Prometheus stage promtool folder name

* Use /dns/ instead of /dns4/ (paritytech#6369)

* add system_dryRun (paritytech#6300)

* add system_dryRun

* fix build error

* delete unneeded code

* return ApplyExtrinsicResult directly

* line width

* mark dry run unsafe

* line width

* fix test

* add test

* update comment

* fix BlockAttributes encoding (paritytech#6281)

* Allow Sudo to do anything (paritytech#6375)

* All Sudo to do anything.

* Rename old labels.

* Stored call in multisig (paritytech#6319)

* Stored call in multisig

* Docs.

* Benchmarks.

* Fix

* Update frame/multisig/src/lib.rs

Co-authored-by: Bastian Köcher <[email protected]>

* patch benchmarks

* Minor grumbles.

* Update as_multi weight

* Fixes and refactoring.

* Split out threshold=1 and opaquify Call.

* Compiles, tests pass, weights are broken

* Update benchmarks, add working tests

* Add benchmark to threshold 1, add event too

* suppress warning for now

* @xlc improvment nit

* Update weight and tests

* Test for weight check

* Fix line width

* one more line width error

* Apply suggestions from code review

Co-authored-by: Alexander Popiak <[email protected]>

* fix merge

* more @apopiak feedback

* Multisig handles no preimage

* Optimize return weight after dispatch

* Error on failed deposit.

Co-authored-by: Bastian Köcher <[email protected]>
Co-authored-by: Shawn Tabrizi <[email protected]>
Co-authored-by: Alexander Popiak <[email protected]>

* Fix the broken weight multiplier update function (paritytech#6334)

* Initial draft, has some todos left

* remove ununsed import

* Apply suggestions from code review

* Some refactors with migration

* Fix more test and cleanup

* Fix for companion

* Apply suggestions from code review

Co-authored-by: Alexander Popiak <[email protected]>

* Update bin/node/runtime/src/impls.rs

* Fix weight

* Add integrity test

* length is not affected.

Co-authored-by: Alexander Popiak <[email protected]>

* Restrict remove_proxies (paritytech#6383)

* Remove penalty on duplicate Status message (paritytech#6377)

* `decl_module!` print better error on duplicate reserved keyword (paritytech#6384)

* `decl_module!` print better error on duplicate reserved keyword

This prints a better error message on duplicated reserved keywords,
instead of complaining because of missing `origin`.

* Review feedback

* FixedPointNumber: zero is not positive. (paritytech#6385)

* Allow empty values in the storage (paritytech#6364)

* Allow empty values in the storage

* Bump trie-bench

* Bump trie-bench

* Pallet: Atomic Swap (paritytech#6349)

* Init atomic swap pallet

* Implement module swap operations

* Add successful swap test

* Bump node spec_version

* Fix storage name

* Add ProofLimit parameter to prevent proof size being too large

* Add missing events

* Basic weight support

* Add basic docs

* Mark swap on claim

This handles the additional case if `repatriate_reserved` fails.

* Add additional expire handler

* Update frame/atomic-swap/src/lib.rs

Co-authored-by: Shawn Tabrizi <[email protected]>

* Add docs on ProofLimit

* Fix test

* Return Ok(()) even when the transfer fails

Because we need to mark the swap as claimed no matter what.

* Remove retry logic

It's overkill. Swap is about something being executed, not necessarily successful.
Although there should be logic (reserve and unreserve) to make it so that both parties *believes*
that the execution is successful.

* succeed -> succeeded

* Add docs on duration -- revealer should use duration shorter than counterparty

* Missing trait type

Co-authored-by: Shawn Tabrizi <[email protected]>

* Runtime interface to add support for tracing from wasm (paritytech#6381)

* Add span recording to tracing implementation

* Add tracing proxy

* switch to rustc_hash::FxHashMap

* Replace lazy_static and hashmap with thread_local and vec.

* fix marking valid span as invalid while removing invalid spans

* refactor, add wasm_tracing module in `support`

* update registered spans

* tidy up

* typos

* refactor

* update flag name to signal lost trace - `is_valid_trace`

* update flag name to signal lost trace - `is_valid_trace`

* update docs

* update docs

* Use tracing Field recording to store the actual `name` and `target`
from wasm traces.

* fix debug log in subscriber + small refactor

* add tests

* handle misuse in case trying to exit span not held

* Implement filter for wasm traces, simplify field recording for primitive types

* remove superfluous warning

* update docs

* Update primitives/tracing/src/proxy.rs

Co-authored-by: Kian Paimani <[email protected]>

* Apply suggestions from code review

Co-authored-by: Bastian Köcher <[email protected]>

* update docs, apply suggestions

* move Proxy from thread_local to `Extension`, rename macro

* fix test

* unify native & wasm span macro calls

* implement wasm tracing control facility in primitives and frame

* add cli flag `--wasm-tracing`

* fix

* switch to `Option<u4>` (possible performance degradation), switch
to static mut bool

* performance improvement using u64 vs Option<u64>

* performance improvement moving concat to client

* update docs

* Update client/cli/src/params/import_params.rs

Co-authored-by: Cecile Tonglet <[email protected]>

* performance improvement

* Revert "performance improvement"

This reverts commit cff0aa2.

* small refactor

* formatting

* bump impl_version

* Update client/cli/src/config.rs

Co-authored-by: Bastian Köcher <[email protected]>

* update docs

* small fixes, remove pub static

* nit

* add integration tests and refactor Subscriber

* tests

* revert formatting

* try fix test that works locally but not in CI

* try fix test that works locally but not in CI

* debug test that works locally but not in CI

* fix test that works locally but not in CI

* remove pub visibility from bool in runtime

* make TracingSpanGuard #[cfg(not(feature = "std"))], update docs, comments

* make TracingProxy drop implementation conditional on !empty state

* add docs for TraceHandler

* remove blank line

* update expect message

* update tests

* rename cli option to tracing_enable_wasm

* rename cli option to tracing_enable_wasm

* fix

* ensure wasm-tracing features are wasm only

* bump impl_version

* bump impl_version

* add `"pallet-scheduler/std"` to `[features]` `std` in node/runtime

* refactor service to remove sp_tracing dependency

* refactor: line width, trait bounds

* improve LogTraceHandler output

* fix test

* improve tracing log output

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Bastian Köcher <[email protected]>

* swap wasm indication from trace name to a separate value

* Update client/tracing/src/lib.rs

* add docs

* remove runtime features

remove wasm_tracing option from CLI

remove wasm_tracing flag from ProfilingSubscriber

Co-authored-by: Matt Rutherford <[email protected]>
Co-authored-by: Kian Paimani <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
Co-authored-by: Cecile Tonglet <[email protected]>

* Block packet size limit

* Revert "Block packet size limit"

This reverts commit 9a5892e.

* Update s3 artifact url (paritytech#6399)

* Increase network buffer sizes even more (paritytech#6080)

* Remove pallet-balances from non-dev-deps (paritytech#6407)

* Babe VRF Signing in keystore (paritytech#6225)

* Introduce trait

* Implement VRFSigner in keystore

* Use vrf_sign from keystore

* Convert output to VRFInOut

* Simplify conversion

* vrf_sign secondary slot using keystore

* Fix RPC call to claim_slot

* Use Public instead of Pair

* Check primary threshold in signer

* Fix interface to return error

* Move vrf_sign to BareCryptoStore

* Fix authorship_works test

* Fix BABE logic leaks

* Acquire a read lock once

* Also fix RPC acquiring the read lock once

* Implement a generic way to construct VRF Transcript

* Use make_transcript_data to call sr25519_vrf_sign

* Make sure VRFTranscriptData is serializable

* Cleanup

* Move VRF to it's own module

* Implement & test VRF signing in testing module

* Remove leftover

* Fix feature requirements

* Revert removing vec macro

* Drop keystore pointer to prevent deadlock

* Nitpicks

* Add test to make sure make_transcript works

* Fix mismatch in VRF transcript

* Add a test to verify transcripts match in babe

* Return VRFOutput and VRFProof from keystore

* Update `libp2p-ping`. (paritytech#6412)

Bugfix release, see [CHANGELOG].

[CHANGELOG]: https://github.com/libp2p/rust-libp2p/blob/master/protocols/ping/CHANGELOG.md

* Remove --legacy-network-protocol CLI flag (paritytech#6411)

* Scale and increase validator count (paritytech#6417)

* Expose constants from Proxy Pallet (paritytech#6420)

* .maintain/monitoring: Add alerting rule tests (paritytech#6343)

* .maintain/monitoring: Add alerting rule tests

* .maintain/monitoring/alerting-rules/alerting-rules.yaml: Break lines

* .gitlab-ci.yml: Add promtool rule testing step

* [CI] Label PRs if polkadot companion build fails (paritytech#6410)

* add polkadot-companion-labels.yml

* fix polkadot companion job name

* add opened event to polkadot-companion-labels.yml

* Dont label on timeouts

* increase timeouts

* increase timeouts again... to be sure

* Switch to s3krit/await-status-action

Turns out Sibz/await-status-action looks at /ref/statuses, which lists ALL statuses (i.e., if you send a pending and a failure for the same context, it will see both and assume the job is still pending.). I forked and point at /ref/status, which shows a combined summary of each status (i.e., only ever shows the most recent status of a single context).

* Print bad mandatory error (paritytech#6416)

* Print bad mandatory error

This prints the error that leads to bad mandatory.

* Update frame/system/src/lib.rs

Co-authored-by: Shawn Tabrizi <[email protected]>

* Adds missing trait import

Co-authored-by: Shawn Tabrizi <[email protected]>

* Track last blocks in informant display (paritytech#6429)

This implements tracking of the last seen blocks in informant display
to prevent printing the import message twice. In Cumulus we first import
blocks as part of the block building with `new_best == false` and set
the best block after we know which one was included by the relay chain.
This leads to printing the import messages two times. This pr solves the
problem by track the latest seen blocks to not print the message twice.

* Simple Docs for Atomic Swap Pallet (paritytech#6434)

* Simple Docs for Atomic Swap Pallet

* Fix copy-and-paste error

* More descriptive error message when invalid slot duration is used (paritytech#6430)

* Initial commit

Forked at: d735e4d
No parent branch.

* Errors if slot_duration is zero

* Errors if slot_duration is zero

* Revert "Errors if slot_duration is zero"

This reverts commit a9e9820.

* Update client/consensus/slots/src/lib.rs

Co-authored-by: Bastian Köcher <[email protected]>

* Root origin use no filter by default. Scheduler and Democracy dispatch without asserting BaseCallFilter (paritytech#6408)

* make system root origin build runtime origin with no filter

* additional doc

* llow decl-module to have a where clause with trailing comma (paritytech#6431)

* .gitlab-ci.yml: Use promtool from paritytech/tools:latest image (paritytech#6425)

* Update sync chain info on own block import (paritytech#6424)

Before we only updated the chain info of sync when we have imported
something using the import queue. However, if you import your own
blocks, this is not done using the import queue and so sync is not
updated. If we don't do this, it can lead to sync switching to "major
sync" mode because sync is not informed about new blocks. This
especially happens on Cumulus, where a collator is selected multiple
times to include its block into the relay chain and thus, sync switches
to major sync mode while the node is still building blocks.

* client/authority-discovery: Compare PeerIds and not Multihashes (paritytech#6414)

In order to tell whether an address is the local nodes address the
authority discovery module previously compared the Multihash within the
`p2p` Multiaddr protocol.

rust-libp2p recently switched to a new PeerId representation (see [1]).
Multihashes of the same PeerId in the new and the old format don't
equal.

Instead of comparing the Multihashes, this patch ensures the module
compares the PeerIds

[1] libp2p/rust-libp2p#555

* add network propagated metrics (paritytech#6438)

* change (ci): add interruptible to kubernetes jobs (paritytech#6441)

* Avoid multisig reentrancy (paritytech#6445)

* Validate encoding of extrinsics passed to runtime (paritytech#6442)

* Validate encoding of extrinsics passed to runtime

* Bump codec version explicitly

* Fix Babe secondary plain slots claiming (paritytech#6451)

We need to check that the public key of an authority exists in our
keystore before we can successfully claim a plain secondary slot.

* sp-npos-elections should not depend on itself (paritytech#6444)

This removes the `dev-dependency` onto `sp-npos-elections` from itself.
A crate should not depend on itself directly, especially not to make any
macros work.

* Don't autolabel insubstantial PRs 'pleasereview' (paritytech#6447)

* change everything to transaction (paritytech#6440)

* node: spawn block authoring and grandpa voter as blocking tasks (paritytech#6446)

* service: add spawner for essential tasks

* node: spawn block authoring and grandpa voter as blocking tasks

* Apply suggestions from code review

Co-authored-by: Bastian Köcher <[email protected]>

* pallet-atomic-swap: generialized swap action (paritytech#6421)

* pallet-atomic-swap: generialized swap action

* Bump spec_version

* Fix weight calculation

* Remove unnecessary type aliases

* Fix issues with `Operational` transactions validity and prioritization. (paritytech#6435)

* Fix weight limit for operational transactions.

* Include BlockExecutionWeight.

* `pallet-staking`: Expose missing consts (paritytech#6456)

* `pallet-staking`: Expose missing consts

* Apply suggestions from code review

Co-authored-by: Nikolay Volf <[email protected]>
Co-authored-by: joe petrowski <[email protected]>

* Update the source docs

Co-authored-by: Kian Paimani <[email protected]>
Co-authored-by: Nikolay Volf <[email protected]>
Co-authored-by: joe petrowski <[email protected]>

* update collective events docs to be consistent with changes (paritytech#6463)

* [CI] Don't tag PRs on companion job cancels (paritytech#6470)

* network: remove unused variable (paritytech#6460)

* Avoid panic on dropping a `sc_network::service::out_events::Receiver`. (paritytech#6458)

* Avoid panic on dropping a `Receiver`.

* CI

* Implement nested storage transactions (paritytech#6269)

* Add transactional storage functionality to OverlayChanges

A collection already has a natural None state. No need to
wrap it with an option.

* Add storage transactions runtime interface

* Add frame support for transactions

* Fix committed typo

* Rename 'changes' variable to 'overlay'

* Fix renaming change

* Fixed strange line break

* Rename clear to clear_where

* Add comment regarding delete value on mutation

* Add comment which changes are covered by a transaction

* Do force the arg to with_transaction return a Result

* Use rust doc comments on every documentable place

* Fix wording of insert_diry doc

* Improve doc on start_transaction

* Rename value to overlayed in close_transaction

* Inline negation

* Improve wording of close_transaction comments

* Get rid of an expect by using get_or_insert_with

* Remove trailing whitespace

* Rename should to expected in tests

* Rolling back a transaction must mark the overlay as dirty

* Protect client initiated storage tx from being droped by runtime

* Review nits

* Return Err when entering or exiting runtime fails

* Documentation fixup

* Remove close type

* Move enter/exit runtime to excute_aux in the state-machine

* Rename Discard -> Rollback

* Move child changeset creation to constructor

* Move child spawning into the closure

* Apply suggestions from code review

Co-authored-by: Bastian Köcher <[email protected]>

* Fixup for code suggestion

* Unify re-exports

* Rename overlay_changes to mod.rs and move into subdir

* Change proof wording

* Adapt a new test from master to storage-tx

* Suggestions from the latest round of review

* Fix warning message

Co-authored-by: Bastian Köcher <[email protected]>

* Optimize offchain worker api by re-using http-client (paritytech#6454)

* Fix typo in offchain's docs

* Use Self keyword in AsyncApi::new()

* Move httpclient to be part of OffchainWorkers to optimize block import

* Fix compilation errors for tests

* Add wrapper struct for HyperClient

* Use lazy_static share SharedClient amongst OffchainWorkers. Remove the need to raise the fd limit

* Revert "Use lazy_static share SharedClient amongst OffchainWorkers. Remove the need to raise the fd limit"

This reverts commit 7af9749.

* Add lazy_static for tests

* Remove lingering runtime upgrades (paritytech#6476)

* Remove lingering runtime upgrades

* remove unused warnings

* remove tests

* impl Debug for sc_service::Configuration (paritytech#6400)

* Initial commit

Forked at: d735e4d
No parent branch.

* Make sc_service::Configuration derive Debug

* Replace task_executor fn's input by proper TaskExecutor type (cleaner)

* impl From<Fn> for TaskExecutor

* Update client/cli/src/runner.rs

* Add some doc, examples and tests

* Replace Deref by fn spawn as suggested

Co-authored-by: Bastian Köcher <[email protected]>

* Fix `sp-api` handling of multiple arguments (paritytech#6484)

With the switch to `decode_all_with_depth_limit` we silently broken
support for functions with multiple arguments. The old generated code
tried to decode each parameter separately, which does not play well with
`decode_all`.

This pr adds a test to ensure that this does not happen again and fixes
the bug by decoding everything at once by wrapping it into tuples.

* Fix the browser node and ensure it doesn't colour the informant output (paritytech#6457)

* Fix browser informant

* Fix documentation

* Add an informant_output_format function to the cli config

* Wrap informant output format in an option

* Revert batch verifier

* Remove wasm-timer from primitives io cargo lock

* Drop informant_output_format function

* derive debug for output format

* bound some missing bound for elevated trait (paritytech#6487)

* `pallet-scheduler`: Check that `when` is not in the past (paritytech#6480)

* `pallet-scheduler`: Check that `when` is not in the past

* Break some lines

* client/network/service: Add primary dimension to connection metrics (paritytech#6472)

* client/network/service: Add primary dimension to connection metrics

Two nodes can be interconnected via one or more connections. The first
of those connections is called the primary connection.

This commit adds another dimension to the
`sub_libp2p_connections_{closed,opened}_total` metrics to differentiate
primary and non-primary connections being opened / closed.

By intuition more than one connection between two nodes is rare.
Tracking the fact whether a connection is primary or not will help prove
or disprove this intuition.

* .maintain/monitoring: Ensure to sum over all connections_closed variants

* client/network/service: Rename is_primary to is_first

* client/network/service: Split by metric name with two additional metrics

* Revert ".maintain/monitoring: Ensure to sum over all connections_closed variants"

This reverts commit 2d2f93e.

* client/network/service: Remove labels from distinct metrics

* Ensure the listen addresses are consistent with the transport (paritytech#6436)

* Initial commit

Forked at: 0c42ced
No parent branch.

* Ensure the listen addresses are consistent with the transport

* Update client/network/src/error.rs

* Update client/network/src/service.rs

* Better implementation

* Fix bad previous impl

* add boot_nodes

* reserved nodes

* test boot nodes

* reserved nodes tests

* add public_addresses and make specific error type

* Update client/network/src/error.rs

Co-authored-by: Pierre Krieger <[email protected]>

Co-authored-by: Pierre Krieger <[email protected]>

* pallet-contracts: migrate to nested storage transaction mechanism (paritytech#6382)

* Add a simple direct storage access module

* WIP

* Completely migrate to the transactional system.

* Format

* Fix wasm compilation

* Get rid of account_db module

* Make deposit event eager

* Make restore_to eager

* It almost compiles.

* Make it compile.

* Make the tests compile

* Get rid of account_db

* Drop the result.

* Backport the book keeping.

* Fix all remaining tests.

* Make it compile for std

* Remove a stale TODO marker

* Remove another stale TODO

* Add proof for `terminate`

* Remove a stale comment.

* Make restoration diverging.

* Remove redudnant trait: `ComputeDispatchFee`

* Update frame/contracts/src/exec.rs

Co-authored-by: Alexander Theißen <[email protected]>

* Introduce proper errors into the storage module.

* Adds comments for contract storage module.

* Inline `ExecutionContext::terminate`.

* Restore_to should not let sacrifice itself if the contract present on the stack.

* Inline `transfer` function

* Update doc - add "if succeeded"

* Adapt to TransactionOutcome changes

* Updates the docs for `ext_restore_to`

* Add a proper assert.

* Update frame/contracts/src/wasm/runtime.rs

Co-authored-by: Alexander Theißen <[email protected]>

Co-authored-by: Alexander Theißen <[email protected]>
Co-authored-by: Alexander Theißen <[email protected]>

* Update lock

Co-authored-by: Subsocial <[email protected]>
Co-authored-by: Benjamin Kampmann <[email protected]>
Co-authored-by: Nikolay Volf <[email protected]>
Co-authored-by: Rakan Alhneiti <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
Co-authored-by: Gavin Wood <[email protected]>
Co-authored-by: Cecile Tonglet <[email protected]>
Co-authored-by: Pierre Krieger <[email protected]>
Co-authored-by: Shawn Tabrizi <[email protected]>
Co-authored-by: Seun Lanlege <[email protected]>
Co-authored-by: David Craven <[email protected]>
Co-authored-by: Marcio Diaz <[email protected]>
Co-authored-by: Shaopeng Wang <[email protected]>
Co-authored-by: Denis Pisarev <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
Co-authored-by: Sergei Shulepov <[email protected]>
Co-authored-by: Roman Borschel <[email protected]>
Co-authored-by: André Silva <[email protected]>
Co-authored-by: Tomasz Drwięga <[email protected]>
Co-authored-by: Alexander Popiak <[email protected]>
Co-authored-by: Dan Forbes <[email protected]>
Co-authored-by: Alexander Theißen <[email protected]>
Co-authored-by: joe petrowski <[email protected]>
Co-authored-by: Kian Paimani <[email protected]>
Co-authored-by: Tore19 <[email protected]>
Co-authored-by: wangjj9219 <[email protected]>
Co-authored-by: Guillaume Thiolliere <[email protected]>
Co-authored-by: tgmichel <[email protected]>
Co-authored-by: Demi Obenour <[email protected]>
Co-authored-by: Bernhard Schuster <[email protected]>
Co-authored-by: Bernhard Schuster <[email protected]>
Co-authored-by: s3krit <[email protected]>
Co-authored-by: Ashley <[email protected]>
Co-authored-by: Max Inden <[email protected]>
Co-authored-by: Xiliang Chen <[email protected]>
Co-authored-by: Svyatoslav Nikolsky <[email protected]>
Co-authored-by: Arkadiy Paronyan <[email protected]>
Co-authored-by: Wei Tang <[email protected]>
Co-authored-by: mattrutherford <[email protected]>
Co-authored-by: Matt Rutherford <[email protected]>
Co-authored-by: ddorgan <[email protected]>
Co-authored-by: Toralf Wittner <[email protected]>
Co-authored-by: pscott <[email protected]>
Co-authored-by: Alexander Theißen <[email protected]>
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants