Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Four IBC packets are stuck on Osmosis <> Evmos channel #2453

Closed
6 tasks
adizere opened this issue Jul 21, 2022 · 5 comments · Fixed by #2512
Closed
6 tasks

Four IBC packets are stuck on Osmosis <> Evmos channel #2453

adizere opened this issue Jul 21, 2022 · 5 comments · Fixed by #2512
Labels
A: blocked Admin: blocked by another (internal/external) issue or PR I: dependencies Internal: related to dependencies
Milestone

Comments

@adizere
Copy link
Member

adizere commented Jul 21, 2022

Summary of Bug

There are 4 stuck packets in the Evmos <> Osmosis channel.
https://www.mintscan.io/osmosis/relayers/channel-204
The channel ends are summarized below.

hermes --config ~/.hermes/evmos.toml query channel ends --chain osmosis-1 --port transfer --channel channel-204
ChannelEndsSummary {
    chain_id: ChainId {
        id: "osmosis-1",
        version: 1,
    },
    client_id: ClientId(
        "07-tendermint-1840",
    ),
    connection_id: ConnectionId(
        "connection-1440",
    ),
    channel_id: ChannelId(
        "channel-204",
    ),
    port_id: PortId(
        "transfer",
    ),
    counterparty_chain_id: ChainId {
        id: "evmos_9001-2",
        version: 2,
    },
    counterparty_client_id: ClientId(
        "07-tendermint-0",
    ),
    counterparty_connection_id: ConnectionId(
        "connection-0",
    ),
    counterparty_channel_id: ChannelId(
        "channel-0",
    ),
    counterparty_port_id: PortId(
        "transfer",
    ),
}

The packets were created ~1 month ago and have the following sequence numbers.

hermes --config ~/.hermes/evmos.toml query packet pending --chain osmosis-1 --port transfer --channel channel-204
Summary {
    src: PendingPackets {
        unreceived_packets: [
            Sequence(
                127037,
            ),
            Sequence(
                127038,
            ),
            Sequence(
                127044,
            ),
            Sequence(
                127045,
            ),
        ],
        unreceived_acks: [],
    },
    dst: PendingPackets {
        unreceived_packets: [],
        unreceived_acks: [],
    },
}

Two of the four transactions:

Since these packets have been stuck, plenty of traffic went through that same channel. For reference, the sequence number of the last packet (currently) on osmosis-1:channel-204 that was successfully received on evmos and then acknowledged back on osmosis-1 is 187,791.

Why are these packets stuck?

If we try to clear those packets, we see the following:

Running `target/debug/hermes --config /Users/adi/.hermes/evmos.toml clear packets --chain osmosis-1 --port transfer --channel channel-204`

2022-07-21T12:31:16.332567Z  INFO ThreadId(01) PacketRecvCmd{src_chain=osmosis-1 src_port=transfer src_channel=channel-204 dst_chain=evmos_9001-2}: unreceived packets found: 4
2022-07-21T12:31:33.638682Z  INFO ThreadId(01) PacketRecvCmd{src_chain=osmosis-1 src_port=transfer src_channel=channel-204 dst_chain=evmos_9001-2}: pulled packet data for 0 events; events_total=4 events_left=0

So Hermes is able to identify the 4 packets, but does not find packet data "pulled packet data for 0 events". In more details:

  • looking at transaction hash DF4A3B44667F33231248F3C9690D2573E9195C8897CE8B37067B7B67BD844843 in particular
  • any full node Hermes uses for queries (/tx_search endpoint) returns the event result with log "account sequence mismatch, expected 130, got 129"
    • the height reported is 4'721'268
  • mintscan shows for the same transaction hash that it's pending
    • height reported by mintscan is 4'713'069

It is unclear why are there 2 separate heights associated with this transaction hash. But we were able to confirm from the Tendermint team that:

we discovered now the issue for why hermes is not able to fetch the successful transaction hash -- it's because tendermint overwrites the tx index with the latest tx (in this case, the failing tx with account seq mismatch)
however, you can see it when you check this out: https://rpc.osmosis.interbloc.org/block_results?height=4713069

Reference for the Tendermint issue: tendermint/tendermint#8945

The full output is very large (part of output is here), but the packet data can be seen there.
The problem is that it seems the tx_search endpoint is unable to reach that packet data. So far, we tried two approaches:

  • different full nodes
  • different ways of specifying the query
    • = 4713069 (height specification) versus <= 4713070 versus versus eliminating all fields identifying the packet and keeping only send_packet.packet_sequence=127044
    • specify the Height as a String versus as a i64 (String doesn't work)
    • parsing the query from a String (let query: Query = "tx.height=4713069 AND send_packet.packet_sequence=127044".parse().unwrap();) instead of building it via Query::eq interface.

The typical outcome is we find no response.

2022-07-21T17:55:29.578249Z WARN ThreadId(11) using this request Query { event_type: None, conditions: [Eq("tx.height", Unsigned(4713069)), Eq("send_packet.packet_sequence", Unsigned(127037))] }
2022-07-21T17:55:30.261687Z WARN ThreadId(11) got this response Response { txs: [], total_count: 0 }

If we are more loose in the way we specify the query, we find the packet data is more recent (from block::Height(4721268)), which includes "account sequence mismatch" error and the relayer is unable to use.

hermes --config ~/.hermes/evmos.toml query packet pending --chain osmosis-1 --port transfer --channel channel-204

2022-07-21T16:57:12.942955Z WARN ThreadId(11) using this request Query { event_type: None, conditions: [Eq("send_packet.packet_src_channel", String("channel-204")), Eq("send_packet.packet_src_port", String("transfer")), Eq("send_packet.packet_dst_channel", String("channel-0")), Eq("send_packet.packet_dst_port", String("transfer")), Eq("send_packet.packet_sequence", String("127037")), Eq("tx.height", Signed(4713068))] }
2022-07-21T16:57:13.004196Z WARN ThreadId(11) got this response Response { txs: [Response { hash: transaction::Hash(8206FD8F5AA7F8B8E13FBC68C34BFB2BAE340609F2E087D9A155FC12BD4AAC2F), height: block::Height(4721268), index: 34, tx_result: DeliverTx { code: Err(32), data: Data([]), log: Log("account sequence mismatch, expected 100, got 99: incorrect account sequence"), info: Info(""), gas_wanted: Gas(450000), gas_used: Gas(53465), events: [], codespace: Codespace("sdk") }, tx: Transaction([10, 147, 2, 10, 144, 2, 10, 41, 47, 105, 98, 99, 46, 97, 112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 115, 46, 116, 114, 97, 110, 115, 102, 101, 114, 46, 118, 49, 46, 77, 115, 103, 84, 114, 97, 110, 115, 102, 101, 114, 18, 226, 1, 10, 8, 116, 114, 97, 110, 115, 102, 101, 114, 18, 11, 99, 104, 97, 110, 110, 101, 108, 45, 50, 48, 52, 26, 92, 10, 68, 105, 98, 99, 47, 54, 65, 69, 57, 56, 56, 56, 51, 68, 52, 68, 53, 68, 53, 70, 70, 57, 69, 53, 48, 68, 55, 49, 51, 48, 70, 49, 51, 48, 53, 68, 65, 50, 70, 70, 65, 48, 67, 54, 53, 50, 68, 49, 68, 68, 57, 67, 49, 50, 51, 54, 53, 55, 67, 54, 66, 52, 69, 66, 50, 68, 70, 56, 65, 18, 20, 53, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 34, 43, 111, 115, 109, 111, 49, 54, 56, 114, 53, 99, 114, 114, 113, 103, 104, 118, 104, 116, 115, 114, 97, 121, 115, 104, 102, 104, 107, 112, 122, 101, 109, 119, 50, 122, 106, 102, 56, 104, 116, 55, 57, 97, 120, 42, 44, 101, 118, 109, 111, 115, 49, 114, 112, 53, 103, 108, 97, 117, 48, 57, 101, 52, 117, 118, 103, 121, 51, 116, 113, 102, 119, 116, 103, 51, 114, 57, 114, 101, 53, 119, 51, 106, 108, 99, 50, 114, 55, 110, 97, 50, 6, 8, 2, 16, 190, 133, 40, 56, 128, 184, 158, 208, 217, 226, 162, 251, 22, 18, 104, 10, 80, 10, 70, 10, 31, 47, 99, 111, 115, 109, 111, 115, 46, 99, 114, 121, 112, 116, 111, 46, 115, 101, 99, 112, 50, 53, 54, 107, 49, 46, 80, 117, 98, 75, 101, 121, 18, 35, 10, 33, 2, 240, 247, 144, 31, 128, 166, 60, 165, 189, 252, 116, 121, 104, 203, 71, 100, 14, 57, 23, 106, 119, 21, 33, 183, 239, 67, 240, 9, 137, 212, 227, 182, 18, 4, 10, 2, 8, 127, 24, 99, 18, 20, 10, 14, 10, 5, 117, 111, 115, 109, 111, 18, 5, 49, 49, 50, 53, 48, 16, 208, 187, 27, 26, 64, 246, 208, 145, 156, 174, 43, 155, 219, 141, 90, 194, 155, 70, 46, 135, 219, 200, 155, 3, 229, 146, 20, 128, 173, 52, 42, 242, 67, 245, 12, 169, 59, 67, 178, 62, 202, 137, 18, 42, 181, 134, 211, 89, 182, 193, 107, 10, 15, 248, 168, 9, 224, 67, 16, 16, 151, 121, 128, 169, 246, 40, 233, 254, 75]), proof: None }], total_count: 1 }
2022-07-21T16:57:13.004776Z WARN ThreadId(11) using this request Query { event_type: None, conditions: [Eq("send_packet.packet_src_channel", String("channel-204")), Eq("send_packet.packet_src_port", String("transfer")), Eq("send_packet.packet_dst_channel", String("channel-0")), Eq("send_packet.packet_dst_port", String("transfer")), Eq("send_packet.packet_sequence", String("127038")), Eq("tx.height", Signed(4713068))] }
2022-07-21T16:57:13.091777Z WARN ThreadId(11) got this response Response { txs: [Response { hash: transaction::Hash(9DA4304E5F69D59DE88DBA3E43D353ADA35B56C003DBEA7FC0F690CB9ECC99AA), height: block::Height(4721268), index: 74, tx_result: DeliverTx { code: Err(32), data: Data([]), log: Log("account sequence mismatch, expected 607, got 606: incorrect account sequence"), info: Info(""), gas_wanted: Gas(450000), gas_used: Gas(52644), events: [], codespace: Codespace("sdk") }, tx: Transaction([10, 197, 1, 10, 194, 1, 10, 41, 47, 105, 98, 99, 46, 97, 112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 115, 46, 116, 114, 97, 110, 115, 102, 101, 114, 46, 118, 49, 46, 77, 115, 103, 84, 114, 97, 110, 115, 102, 101, 114, 18, 148, 1, 10, 8, 116, 114, 97, 110, 115, 102, 101, 114, 18, 11, 99, 104, 97, 110, 110, 101, 108, 45, 50, 48, 52, 26, 14, 10, 5, 117, 111, 115, 109, 111, 18, 5, 54, 51, 55, 52, 51, 34, 43, 111, 115, 109, 111, 49, 54, 51, 114, 50, 102, 97, 107, 103, 51, 120, 100, 118, 106, 121, 51, 118, 114, 113, 51, 97, 56, 52, 102, 54, 102, 120, 120, 100, 117, 101, 116, 114, 120, 116, 107, 106, 114, 114, 42, 44, 101, 118, 109, 111, 115, 49, 101, 55, 48, 118, 109, 57, 113, 121, 116, 114, 110, 116, 120, 104, 117, 99, 120, 97, 115, 53, 119, 118, 50, 50, 119, 107, 107, 109, 112, 48, 100, 118, 100, 100, 102, 112, 121, 115, 50, 6, 8, 2, 16, 235, 135, 40, 56, 128, 156, 132, 199, 145, 149, 163, 251, 22, 18, 105, 10, 81, 10, 70, 10, 31, 47, 99, 111, 115, 109, 111, 115, 46, 99, 114, 121, 112, 116, 111, 46, 115, 101, 99, 112, 50, 53, 54, 107, 49, 46, 80, 117, 98, 75, 101, 121, 18, 35, 10, 33, 2, 4, 140, 161, 71, 0, 89, 246, 217, 24, 222, 0, 133, 236, 174, 15, 97, 8, 137, 47, 209, 183, 44, 161, 50, 242, 28, 160, 188, 237, 206, 227, 131, 18, 4, 10, 2, 8, 127, 24, 222, 4, 18, 20, 10, 14, 10, 5, 117, 111, 115, 109, 111, 18, 5, 49, 49, 50, 53, 48, 16, 208, 187, 27, 26, 64, 37, 224, 237, 55, 83, 201, 44, 229, 46, 71, 31, 28, 67, 76, 132, 38, 103, 140, 121, 34, 202, 39, 93, 58, 144, 72, 211, 31, 135, 132, 237, 178, 52, 20, 46, 207, 228, 57, 150, 119, 111, 191, 17, 143, 91, 180, 152, 194, 197, 41, 100, 242, 181, 73, 22, 202, 241, 148, 211, 35, 166, 69, 41, 84]), proof: None }], total_count: 1 }

For further investigation, we need help from someone who understands the difference between the Tendermint tx_search and block_results API, and if the relayer is able to use the latter for reconstructing packet data & proofs to be able to clear this channel.

For developers:

Acceptance Criteria

  • channel-204 cleared up

Note: This is a IBC-wide problem, not specific to ibc-rs/Hermes. The relayer should be able to fix it, however. A large chunk of this investigation is due to Charly and coldchain.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@adizere adizere added I: dependencies Internal: related to dependencies A: blocked Admin: blocked by another (internal/external) issue or PR labels Jul 21, 2022
@adizere adizere modified the milestones: modules/v0.16, Backlog Jul 21, 2022
@ancazamfir
Copy link
Collaborator

For further investigation, we need help from someone who understands the difference between the Tendermint tx_search and block_results API, and if the relayer is able to use the latter for reconstructing packet data & proofs to be able to clear this channel.

Yes we should be able to use block_results RPC, maybe offer a CLI to clear pending events from a block at certain height. We are already using this as part of the search_blocks() for begin/ end block events. It shouldn't be hard to impl this.

@ancazamfir
Copy link
Collaborator

ancazamfir commented Jul 28, 2022

I have some diffs in anca/four_packets branch. packet-recv takes a new param packet-data-query-height that is used just for querying the packet data. The command to clear packets would be:

hermes --config four.toml tx packet-recv --dst-chain evmos_9001-2 --src-chain osmosis-1 --src-port transfer --src-channel channel-204 --packet-data-query-height 4713069

Works on a local setup. But couldn't run it on osmosis because the rpc nodes in the hermes config included in this issue are far and slow and hermes cannot finish the block_results. Same for https://rpc.osmosis.interbloc.org/block_results?height=4713069 or curl (fails around 14M).

$ hermes --config four.toml tx packet-recv --dst-chain evmos_9001-2 --src-chain osmosis-1 --src-port transfer --src-channel channel-204 --packet-data-query-height 4713069

2022-07-28T15:53:06.904226Z  INFO ThreadId(01) PacketRecvCmd{src_chain=osmosis-1 src_port=transfer src_channel=channel-204 dst_chain=evmos_9001-2}: unreceived packets found: 4
[relayer/src/link/packet_events.rs:68] &src_query_height = Height {
    revision: 1,
    height: 4713069,
}
2022-07-28T15:53:41.709760Z  WARN ThreadId(01) PacketRecvCmd{src_chain=osmosis-1 src_port=transfer src_channel=channel-204 dst_chain=evmos_9001-2}: encountered query failure while pulling packet data: link failed with underlying error: RPC error to endpoint https://rpc.osmosis.interbloc.org/: serde parse error: EOF while parsing a string at line 351873 column 30
SUCCESS []

@ancazamfir
Copy link
Collaborator

One more note, in case someone has better RPC nodes and is able to run these CLIs.
There seem to be 4 packets stuck, seqs: 127037, 127038, 127044 and 127045.
The first two were included in block at height 4713068 (has 319 Tx-es), the last two in block at height 4713069 (has 334 Tx-es). So we need to run packet-recv CLI twice with the two heights.

@ancazamfir
Copy link
Collaborator

The alternative is to craft the four packets and build the MsgReceivePacket from them. The logs with the packet data is included here.

@doggystylez
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: blocked Admin: blocked by another (internal/external) issue or PR I: dependencies Internal: related to dependencies
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@adizere @ancazamfir @doggystylez and others