Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ccan, fix latent bug in gossmap code that reveals #5375

Merged
merged 2 commits into from
Jul 11, 2022

Conversation

rustyrussell
Copy link
Contributor

Updating ccan to stricter htable (as in current ccan!) revealed we were trying to put
(void *)1 in the htable, which is forbidden:

topology: ccan/ccan/htable/htable.c:382: htable_add_: Assertion `entry_is_valid((uintptr_t)p)' failed.
topology: FATAL SIGNAL 6 (version 1358d7f)
0x55f30c689c34 send_backtrace
	common/daemon.c:33
0x55f30c689ce0 crashdump
	common/daemon.c:46
0x7f5d150fe51f ???
	./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x7f5d15152828 __pthread_kill_implementation
	./nptl/pthread_kill.c:44
0x7f5d15152828 __pthread_kill_internal
	./nptl/pthread_kill.c:80
0x7f5d15152828 __GI___pthread_kill
	./nptl/pthread_kill.c:91
0x7f5d150fe475 __GI_raise
	../sysdeps/posix/raise.c:26
0x7f5d150e47b6 __GI_abort
	./stdlib/abort.c:79
0x7f5d150e46da __assert_fail_base
	./assert/assert.c:92
0x7f5d150f5e25 __GI___assert_fail
	./assert/assert.c:101
0x55f30c6adbe4 htable_add_
	ccan/ccan/htable/htable.c:382
0x55f30c65f303 chanidx_htable_add
	common/gossmap.c:35
0x55f30c6605ed new_channel
	common/gossmap.c:337
0x55f30c6609cf add_channel
	common/gossmap.c:425
0x55f30c661101 map_catchup
	common/gossmap.c:607
0x55f30c66221e gossmap_refresh
	common/gossmap.c:927
0x55f30c66e3e9 get_gossmap
	plugins/topology.c:27
0x55f30c66f939 listpeers_done
	plugins/topology.c:369
0x55f30c671f46 handle_rpc_reply
	plugins/libplugin.c:558
0x55f30c672a19 rpc_read_response_one
	plugins/libplugin.c:726
0x55f30c672b4f rpc_conn_read_response
	plugins/libplugin.c:746
0x55f30c6ae35e next_plan
	ccan/ccan/io/io.c:59
0x55f30c6aef93 do_plan
	ccan/ccan/io/io.c:407
0x55f30c6aefd5 io_ready
	ccan/ccan/io/io.c:417
0x55f30c6b1371 io_loop
	ccan/ccan/io/poll.c:453
0x55f30c67587c plugin_main
	plugins/libplugin.c:1559
0x55f30c6708eb main
	plugins/topology.c:701
0x7f5d150e5fcf __libc_start_call_main
	../sysdeps/nptl/libc_start_call_main.h:58
0x7f5d150e607c __libc_start_main_impl
	../csu/libc-start.c:409
0x55f30c65d894 ???
	???:0
0xffffffffffffffff ???
	???:0

Changelog-None

@rustyrussell rustyrussell added this to the v0.12 milestone Jul 4, 2022
@@ -176,25 +176,25 @@ u32 gossmap_chan_idx(const struct gossmap *map, const struct gossmap_chan *chan)
return chan - map->chan_arr;
}

/* htable can't handle NULL values, so we add 1 */
/* htable can't handle NULL or 1 values, so we add 2 */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maybe have missed your explanation in this bug fixing for ccan hash map, I think it is related to the memory pointer? that the hash map use like key?

Just curious on why the table can handle these values :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses these values to mean "empty" and "deleted". Now, usually it's OK, because it also puts hash bits in unused bits (bits which are the same for every value in the map), and there are usually lots of these: values are usually all heap pointers. But it's possible that the hash bits are all 0, in which case these would look like empty or deleted buckets.

It's always been possible, but the updated htable added an assertion to check for (void *)1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, putting 0x2 in the hash table before ccan/htable was updated caused another bug (which is what the htable code just fixed!). So these commits have to be together!

@rustyrussell rustyrussell changed the title gossmap: fix bug in gossmap code Update ccan, fix latent bug in gossmap code that reveals Jul 6, 2022
…ssertions.

Updating ccan to stricter htable revealed we were trying to put
(void *)1 in the htable, which is forbidden:

```
topology: ccan/ccan/htable/htable.c:382: htable_add_: Assertion `entry_is_valid((uintptr_t)p)' failed.
topology: FATAL SIGNAL 6 (version 1358d7f)
0x55f30c689c34 send_backtrace
	common/daemon.c:33
0x55f30c689ce0 crashdump
	common/daemon.c:46
0x7f5d150fe51f ???
	./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x7f5d15152828 __pthread_kill_implementation
	./nptl/pthread_kill.c:44
0x7f5d15152828 __pthread_kill_internal
	./nptl/pthread_kill.c:80
0x7f5d15152828 __GI___pthread_kill
	./nptl/pthread_kill.c:91
0x7f5d150fe475 __GI_raise
	../sysdeps/posix/raise.c:26
0x7f5d150e47b6 __GI_abort
	./stdlib/abort.c:79
0x7f5d150e46da __assert_fail_base
	./assert/assert.c:92
0x7f5d150f5e25 __GI___assert_fail
	./assert/assert.c:101
0x55f30c6adbe4 htable_add_
	ccan/ccan/htable/htable.c:382
0x55f30c65f303 chanidx_htable_add
	common/gossmap.c:35
0x55f30c6605ed new_channel
	common/gossmap.c:337
0x55f30c6609cf add_channel
	common/gossmap.c:425
0x55f30c661101 map_catchup
	common/gossmap.c:607
0x55f30c66221e gossmap_refresh
	common/gossmap.c:927
0x55f30c66e3e9 get_gossmap
	plugins/topology.c:27
0x55f30c66f939 listpeers_done
	plugins/topology.c:369
0x55f30c671f46 handle_rpc_reply
	plugins/libplugin.c:558
0x55f30c672a19 rpc_read_response_one
	plugins/libplugin.c:726
0x55f30c672b4f rpc_conn_read_response
	plugins/libplugin.c:746
0x55f30c6ae35e next_plan
	ccan/ccan/io/io.c:59
0x55f30c6aef93 do_plan
	ccan/ccan/io/io.c:407
0x55f30c6aefd5 io_ready
	ccan/ccan/io/io.c:417
0x55f30c6b1371 io_loop
	ccan/ccan/io/poll.c:453
0x55f30c67587c plugin_main
	plugins/libplugin.c:1559
0x55f30c6708eb main
	plugins/topology.c:701
0x7f5d150e5fcf __libc_start_call_main
	../sysdeps/nptl/libc_start_call_main.h:58
0x7f5d150e607c __libc_start_main_impl
	../csu/libc-start.c:409
0x55f30c65d894 ???
	???:0
0xffffffffffffffff ???
	???:0
```

Signed-off-by: Rusty Russell <[email protected]>
Caused a crash in CI, reproduced under valgrind by calling
any_channel_by_scid from io_poll_lightningd:

```
==2422524== Conditional jump or move depends on uninitialised value(s)
==2422524==    at 0x12C98D: any_channel_by_scid (channel.c:606)
==2422524==    by 0x14FF75: io_poll_lightningd (lightningd.c:682)
==2422524==    by 0x225FDE: io_loop (poll.c:420)
==2422524==    by 0x14A914: io_loop_with_timers (io_loop_with_timers.c:22)
==2422524==    by 0x150C4E: main (lightningd.c:1193)
==2422524==  Uninitialised value was created by a heap allocation
==2422524==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2422524==    by 0x234F61: allocate (tal.c:250)
==2422524==    by 0x235522: tal_alloc_ (tal.c:428)
==2422524==    by 0x12B500: new_unsaved_channel (channel.c:203)
==2422524==    by 0x13B77A: json_openchannel_init (dual_open_control.c:2610)
==2422524==    by 0x14C78D: command_exec (jsonrpc.c:630)
==2422524==    by 0x14CD9F: rpc_command_hook_final (jsonrpc.c:765)
==2422524==    by 0x181DDA: plugin_hook_call_ (plugin_hook.c:278)
==2422524==    by 0x14D198: plugin_hook_call_rpc_command (jsonrpc.c:853)
==2422524==    by 0x14D6A0: parse_request (jsonrpc.c:957)
==2422524==    by 0x14DAFE: read_json (jsonrpc.c:1054)
==2422524==    by 0x2231C8: next_plan (io.c:59)
```

Signed-off-by: Rusty Russell <[email protected]>
@niftynei
Copy link
Collaborator

ACK 388742d

confirmed that test fails are all flakes,

@niftynei niftynei merged commit ae71a87 into ElementsProject:master Jul 11, 2022
cdecker added a commit to cdecker/lightning that referenced this pull request Jul 11, 2022
The gossip_store version was bumbed to 10 in PR ElementsProject#5239 and ElementsProject#5375 has
apparently not been rebased on top of that when it was merged, causing
the `run-gossmap_canned` test to fail when parsing the v9 gossip
store.

Changelog-None
cdecker added a commit to cdecker/lightning that referenced this pull request Jul 11, 2022
The gossip_store version was bumbed to 10 in PR ElementsProject#5239 and ElementsProject#5375 has
apparently not been rebased on top of that when it was merged, causing
the `run-gossmap_canned` test to fail when parsing the v9 gossip
store.

Changelog-None
cdecker added a commit to cdecker/lightning that referenced this pull request Jul 11, 2022
The gossip_store version was bumbed to 10 in PR ElementsProject#5239 and ElementsProject#5375 has
apparently not been rebased on top of that when it was merged, causing
the `run-gossmap_canned` test to fail when parsing the v9 gossip
store.

Changelog-None
cdecker added a commit to cdecker/lightning that referenced this pull request Jul 11, 2022
The gossip_store version was bumbed to 10 in PR ElementsProject#5239 and ElementsProject#5375 has
apparently not been rebased on top of that when it was merged, causing
the `run-gossmap_canned` test to fail when parsing the v9 gossip
store.

Changelog-None
cdecker added a commit to cdecker/lightning that referenced this pull request Jul 11, 2022
The gossip_store version was bumbed to 10 in PR ElementsProject#5239 and ElementsProject#5375 has
apparently not been rebased on top of that when it was merged, causing
the `run-gossmap_canned` test to fail when parsing the v9 gossip
store.

Changelog-None
niftynei pushed a commit that referenced this pull request Jul 11, 2022
The gossip_store version was bumbed to 10 in PR #5239 and #5375 has
apparently not been rebased on top of that when it was merged, causing
the `run-gossmap_canned` test to fail when parsing the v9 gossip
store.

Changelog-None
whitslack pushed a commit to whitslack/lightning that referenced this pull request Oct 8, 2022
The gossip_store version was bumbed to 10 in PR ElementsProject#5239 and ElementsProject#5375 has
apparently not been rebased on top of that when it was merged, causing
the `run-gossmap_canned` test to fail when parsing the v9 gossip
store.

Changelog-None
whitslack pushed a commit to whitslack/lightning that referenced this pull request Oct 28, 2022
The gossip_store version was bumbed to 10 in PR ElementsProject#5239 and ElementsProject#5375 has
apparently not been rebased on top of that when it was merged, causing
the `run-gossmap_canned` test to fail when parsing the v9 gossip
store.

Changelog-None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants