FATAL: Caught exception: It appears that Fulcrum was forcefully killed in the middle of committing a block... #41

ghost · 2020-07-25T18:46:05Z

Great SPV server! It is fast and just what I needed.

I've encountered a problem when restarting Fulcrum:

FATAL: Caught exception: It appears that Fulcrum was forcefully killed in the middle of committng a block to the db. We cannot figure out where exactly in the update process Fulcrum was killed, so we cannot undo the inconsistent state caused by the unexpected shutdown. Sorry!

I restarted the server (via systemd) and also created an image snapshot for the server and then this error appears.

Having used Electrum Cash for a while, I never had any problems with this before (for years) even thought I would restart/terminate instances abruptly.

Is there anyway to prevent this problem from happening or gracefully recover? Thank you!

The text was updated successfully, but these errors were encountered:

cculianu · 2020-07-25T18:51:37Z

Yeah it's a known issue with the way I did the data layout. I will have to redesign the data layout to avoid this in a future version. The recommended way to stop Fulcrum is to send it SIGINT and wait a good 60 seconds. (Usually it's done in 5-10s). See if you can configure systemd to send SIGINT or SIGTERM and have it wait for completion and not kill the process right away. I believe on most systems by default it does wait 30s or more...

You will have to resynch, unfortunately. :/ Sorry about that.

A future version will try to be ACID -- but for now I took speed shortcuts -- so hard shutdown runs the risk of this issue happening if you shut down in the middle of when a block arrived and the DB was being updated.

I understand that ElectrumX did not suffer from this. It was also slower. :)

I will see if I can do ACID without too much of a perf. hit in a future version. For now you will have to resynch from scratch though. Sorry...

If this makes you worried you can always also backup the synched DB (with Fulcrum stopped). That way you can always restore from backup. FWIW I have been running my server for months now and never had to restore from backup.

Sorry about that.

ghost · 2020-07-25T20:09:54Z

Awesome! Thanks for the quick reply.

Yeah it's a known issue with the way I did the data layout. I will have to redesign the data layout to avoid this in a future version.

Would you be able to give me a hint on where to look so that I can implement it?

The problem is not that I force-kill a process, the problem is the server could be under highroad and the OS terminates the process abruptly.

I'm curious about the locking here:

https://github.com/cculianu/Fulcrum/blob/master/src/Storage.cpp#L1232

Does this mean that while we are processing a block, that no one can query mempool/utxo's (ie: blocked threads) until the block is committed?

Thanks so much!

georgengelmann · 2020-07-25T22:35:41Z

I restarted the server (via systemd) and also created an image snapshot for the server and then this error appears.

What's in your systemd file? I don't have RestartSec=60s in it, but it never crashed.

cculianu · 2020-07-26T13:18:51Z

@atteeela --

Would you be able to give me a hint on where to look so that I can implement it?

It's not a trivial fix. It would require redesigning the database to use a single table with different "column families" for each of the logical pieces: utxo_set, scripthash_history, headers, etc. If all of the data lives in a single table it's possible to do "begin transaction", "end transaction" pairs when updating the data when new blocks arrive, and it would be ACID -- in that case even yanking the power chord or whatever will not lead to any corruption (just a rollback). So.. given that, it's more than a quick hackjob -- a person with significant experience in rocksdb and C++ would be needed to do this. I can do this myself -- and the reason why I didn't do it initially was because it was slower than the data layout I have now. I wanted to design this server to be as fast as possible.

Potentially the new data layout would be optional and only for users that prefer ACID over speed. So -- the database layer would need to be abstracted a bit to handle both data layouts.

It's not a small job; I can do it myself -- it just would take me a lot of time.

Does this mean that while we are processing a block, that no one can query mempool/utxo's (ie: blocked threads) until the block is committed?

Yes, that's correct. Typically the locks are held for less than 5ms (sometimes less than 1ms). And blocks arrive once every 10 minutes. It's not exactly a huge deal. This amount to: 0.00083% of the time that locks are being held exclusively, on average. You may get more of a burp and slowdown from your OS kernel than from this. The rest of the time everything is very much parallel.

vul-ture · 2020-10-03T22:41:06Z

I'm running into this issue. It's annoying that I have to start sync over from the beginning. I tried tweaking the db_max_open_files, max_pending_connections, and bitcoind_throttle values down but it's still happening around block 450,000. Can anyone recommend a workaround? Thanks

cculianu · 2020-10-03T22:53:10Z

@vul-ture Is it happening on initial synch? Or later?

Don't kill Fulcrum with kill -9 -- make sure you wait for it to gracefully shut down...

vul-ture · 2020-10-03T22:58:29Z

it happens on initial sync, I'm not even sending a signal to fulcrum. It could be that my RPC connections are getting saturated (?
I'm enabling debugging and stats and will update.

cculianu · 2020-10-03T23:09:20Z

Huh? If you haven't synched yet RPC is not even up yet.

There are two possibilities here:

You are out of disk space
You are out of disk space

Please ensure that the directory you use is on a filesystem that has ~40GB free for mainnet.

cculianu · 2020-10-03T23:10:29Z

Also please ensure you are using a filesystem that supports >2GB files... (e.g. no FAT32 or other ancient filesystem).

vul-ture · 2020-10-03T23:11:07Z

I meant the RPC connections to bitcoin daemon
using ext4 fs with 120GB free

cculianu · 2020-10-03T23:12:04Z

RPC to bitcoind being too slow shouldn't lead to this error message -- you would instead see some warnings about connections dropped / reconnect to bitcoind -- but it would recover from that.

vul-ture · 2020-10-03T23:27:22Z

Right, the daemon connection is fast as well. It must be DB write issue. Performance of this app is excellent, ~10k transactions/sec. If I can work around this error and sync it should work fine. My specs are pretty good so I don't think it's a hardware issue.

cculianu · 2020-10-03T23:53:10Z

Ok well without more info I can't help. I still think somehow your data dir is not on a filesystem with enough space ...

Maybe some verbose logging will elucidate things, one might hope.

vul-ture · 2020-10-04T19:28:52Z

OK I'm able to reproduce the problem, this might be a different issue than the original bug.

[2020-10-04 12:19:27.431] Verifying headers ...
[2020-10-04 12:19:27.431] (Debug) Verifying 481460 headers ...
[2020-10-04 12:19:28.518] (Debug) Read & verified 481460 headers from db in 1086.771 msec
[2020-10-04 12:19:28.518] Initializing header merkle cache ...
[2020-10-04 12:19:29.066] (Debug) Merkle cache initialized to length 481460
[2020-10-04 12:19:29.079] (Debug) Read TxNumNext from file: 248286376
[2020-10-04 12:19:29.079] Checking tx counts ...
[2020-10-04 12:19:30.933] 248286376 total transactions
[2020-10-04 12:19:30.933] UTXO set: 50838980 utxos, 4270.474 MB
[2020-10-04 12:19:30.990] (Debug) Storage starting thread
[2020-10-04 12:19:30.991] BitcoinDMgr: starting 3 bitcoin rpc clients ...
[2020-10-04 12:19:30.991] (Debug) Changed pingtime_ms: 10000
[2020-10-04 12:19:30.991] (Debug) BitcoinD.1 starting thread
[2020-10-04 12:19:30.991] (Debug) Changed pingtime_ms: 10000
[2020-10-04 12:19:30.991] (Debug) BitcoinD.2 starting thread
[2020-10-04 12:19:30.991] (Debug) Changed pingtime_ms: 10000
[2020-10-04 12:19:30.991] (Debug) BitcoinD.3 starting thread
[2020-10-04 12:19:30.991] (Debug) BitcoinDMgr starting thread
[2020-10-04 12:19:30.992] BitcoinDMgr: started ok
[2020-10-04 12:19:30.992] (Debug) Controller starting thread
[2020-10-04 12:19:30.991] <BitcoinD.1> (Debug) TCP BitcoinD.1 (id: 2) socket state: 1
[2020-10-04 12:19:30.991] <BitcoinD.2> (Debug) TCP BitcoinD.2 (id: 3) socket state: 1
[2020-10-04 12:19:30.991] <BitcoinD.2> (Debug) TCP BitcoinD.2 (id: 3) socket state: 2
[2020-10-04 12:19:30.991] <BitcoinD.1> (Debug) TCP BitcoinD.1 (id: 2) socket state: 2
[2020-10-04 12:19:30.992] <BitcoinD.3> (Debug) TCP BitcoinD.3 (id: 4) socket state: 1
[2020-10-04 12:19:30.992] <BitcoinD.3> (Debug) TCP BitcoinD.3 (id: 4) socket state: 2
[2020-10-04 12:19:31.002] <BitcoinD.2> (Debug) TCP BitcoinD.2 (id: 3) 10.10.1.2:8332 socket state: 3
[2020-10-04 12:19:31.002] <BitcoinD.2> (Debug) on_connected 3
[2020-10-04 12:19:31.002] <BitcoinD.1> (Debug) TCP BitcoinD.1 (id: 2) 10.10.1.2:8332 socket state: 3
[2020-10-04 12:19:31.002] <BitcoinD.3> (Debug) TCP BitcoinD.3 (id: 4) 10.10.1.2:8332 socket state: 3
[2020-10-04 12:19:31.002] <BitcoinD.1> (Debug) on_connected 2
[2020-10-04 12:19:31.002] <BitcoinD.3> (Debug) on_connected 4
[2020-10-04 12:19:31.004] (Debug) Auth recvd from bicoind with id: 3, proceeding with processing ...
[2020-10-04 12:19:31.006] (Debug) Refreshed version info from bitcoind, version: 0.16.3, subversion: /Satoshi:0.16.3/
[2020-10-04 12:19:31.007] (Debug) Refreshed genesis hash from bitcoind: 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f
[2020-10-04 12:19:31.101] Block height 651269, downloading new blocks ...
[2020-10-04 12:19:31.101] (Debug) Task.DL 481460 -> 651269 starting thread
[2020-10-04 12:19:31.101] (Debug) Task.DL 481461 -> 651269 starting thread
[2020-10-04 12:19:31.101] (Debug) Task.DL 481462 -> 651269 starting thread
[2020-10-04 12:19:31.101] (Debug) Task.DL 481463 -> 651269 starting thread
[2020-10-04 12:19:31.102] (Debug) Task.DL 481464 -> 651269 starting thread
[2020-10-04 12:19:31.102] (Debug) Task.DL 481465 -> 651269 starting thread
[2020-10-04 12:19:31.102] (Debug) Task.DL 481466 -> 651269 starting thread
[2020-10-04 12:20:32.059] <Task.DL 481460 -> 651269> [Qt Warning] Qt has caught an exception thrown from an event handler. Throwing
exceptions from an event handler is not supported in Qt.
You must not let any exception whatsoever propagate through Qt code.
If that is not possible, in Qt 5 you must at least reimplement
QCoreApplication::notify() and catch all exceptions there.
(:0, )

console reports
(:0, )
what(): ReadCompactSize(): size too large: iostream error

Looks like my bitcoind might have a corrupted block? investigating

cculianu · 2020-10-04T21:06:01Z

Oh man! You’re on bitcoin core (btc).

Fulcrum is for Bitcoin Cash.

However you piqued my curiosity. I will try and get it working with bitcoin btc (maybe) in the coming week or two.

Yeah fulcrum is for bch...

That explains it!

vul-ture · 2020-10-04T21:28:48Z

Lol oops, sorry about that! I will use it for BCH then
and thanks for taking a look at Core.
Looks like I synched a little past the BCH fork block (478559) and then died

cculianu · 2020-11-04T00:37:37Z

Hey @vul-ture if you want you can try using Fulcrum with BTC if you start bitcoind on BTC with -rpcserialversion=0. I think in that case the bitcoind will "speak the same language" as Fulcrum and it may succeed in syncing.

I am going to test this myself but from my reading of the bitcoin core sourcecode it should work.

vul-ture · 2020-11-14T19:22:25Z

testing now with the rpcserialversion flag

cculianu · 2020-11-14T23:03:56Z

Hey! @vul-ture So actually latest Fulcrum release fully supports BTC. Don't use that flag -- it will give you problems later when you connect Electrum to it.

Just get rid of that flag, blow away your existing Fulcrum db, and resynch again from scratch. When it finishes synching you can serve up BTC. I have been doing so for the past week without problems. I have 222 users connected right now.

vul-ture · 2021-01-20T21:13:10Z

Working, thanks! I think this is the fastest electrum server I've seen.

cculianu · 2021-01-21T06:49:04Z

Wow man thanks for the compliment. :) Yeah I tried to make it fast FAST. That was my #1 design goal. (And of course also correctness was too).

Thanks!

cculianu · 2021-01-21T06:50:13Z

@vul-ture PS: Be sure to be on latest Fulcrum 1.4.0 since in the 1.3.x series BTC support was still beta and the mempool code wasn't as fast as it is in 1.4.x series... If you aren't already on Fulcrum 1.4.x, upgrading is simple and doesn't require a db resynch.

apemithrandir · 2022-03-17T09:05:54Z

Just tagging a comment to say I ran into this issue too. I had to forcibly kill my VM where Fulcrum was running on. Then when I brought the VM back up I got this error.
Will have to re-sync now.
Aside from this very happy with the server.

cculianu · 2022-03-17T12:44:14Z

@apemithrandir I see. Sorry to hear that. I do plan on making Fulcrum more resilient to unfortunately-timed crashes in the future. It is a partially solvable problem, meaning that we can get it to a state where for 99% of crashes it should be able to auto-recover. It just requires more logic in the code to diagnose what went wrong and backtrack a bit. Thanks for the feedback. I have never had to re-sync fulcrum after hard system reset and I've had my server randomly lose power or be forced to randomly reset about 12 times in the last 3 years (I was running my server out of my apartment at one point, without battery backup, ha ha).

I sort of optimistically thought that such corruption issues would be rare. But this just proves that anything that can go wrong will, eventually, given a large enough install base.

apemithrandir · 2022-03-18T01:17:39Z

I think it was a combination of my VM being off for a few days and I was bringing it back online. Bitcoind was still grabbing blocks and fulcrum was also still updating. My VM was acting laggy/unresponsive and then I went to restart. When restarting the VM, it just hung on a black screen and I had to do a force turn off.
It is the first time that my fulcrum has required a re-sync like this. My machine has crashed before, but normally when the chain was up to date. With an up to date chain, the chance that a block was being processed at the time of the crash is very low.

craigraw · 2022-03-18T10:22:04Z

Although it pains me to say it, I would give up a little of Fulcrum's stunning performance (if necessary) to have this implemented. Although it tends to happen more during initial indexing (often due to an overly optimistic fast-sync configuration), as you note it's inevitable that with a large enough user base there will be abnormal shutdowns during indexing, particularly with many RPi nodes running without UPS backup. This has been the main technical consideration I have come across from implementors looking to switch from Electrs.

cculianu · 2022-03-18T16:16:16Z

Yeah, it only happens if Fulcrum was in the process of writing out data for a new block -- so during a catch-up phase or a sync it can happen after a non-graceful exit. Under normal operation it's unlikely since a block arrives once every 10 mins and only takes 20-100msec to process, depending on CPU and HD speed..

But yes, this is solvable and I will focus on that in the future. 100% agreed.

apemithrandir · 2022-03-19T00:50:22Z

I am struggling to get through the re-sync without hitting this error. I'm on my 3rd attempt now. This is the most recent forceful kill log:

Mar 19 XX:23:47 XXXX-ubuntu Fulcrum[3429]: [2022-03-19 XX:23:47.832] Processed height: 442000, 60.7%, 4.09 blocks/sec, 7561.0 txs/sec, 27767.0 addrs/sec
Mar 19 XX:24:25 XXXX-ubuntu Fulcrum[3429]: [2022-03-19 XX:24:25.277] Storage UTXO Cache: Flushing to DB ...
Mar 19 XX:25:23 XXXX-ubuntu kernel: [14761.013748] [ 3429] 1000 3429 3940306 2735552 28819456 63613 0 Fulcrum
Mar 19 XX:25:23 XXXX-ubuntu kernel: [14761.013981] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/fulcrum.service,task=Fulcrum,pid=3429,uid=1000
Mar 19 XX:25:23 XXXX-ubuntu kernel: [14761.014012] Out of memory: Killed process 3429 (Fulcrum) total-vm:15761224kB, anon-rss:10942208kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:28144kB oom_score_adj:0
Mar 19 XX:25:24 XXXX-ubuntu kernel: [14762.247928] oom_reaper: reaped process 3429 (Fulcrum), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Mar 19 XX:25:24 XXXX-ubuntu systemd[1]: Stopped Fulcrum.
Mar 19 XX:25:24 XXXX-ubuntu systemd[1]: Started Fulcrum.
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.198] Loaded SSL certificate: Internet Widgits Pty Ltd expires: Sun February 8 2032 XX:29:08
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.200] Loaded key type: private algorithm: RSA
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.202] Enabled JSON parser: simdjson
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.202] simdjson implementations:
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.202] haswell: Intel/AMD AVX2 [supported]
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.202] westmere: Intel/AMD SSE4.2 [supported]
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.202] fallback: Generic fallback implementation [supported]
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.202] active implementation: haswell
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.205] jemalloc: version 5.2.1-0-gea6b3e9
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.205] Qt: version 5.15.2
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.205] rocksdb: version 6.14.6-ed43161
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.205] simdjson: version 0.6.0
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.205] ssl: OpenSSL 1.1.1f 31 Mar 2020
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.205] zmq: libzmq version: 4.3.3, cppzmq version: 4.7.1
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.205] Fulcrum 1.6.0 (Release 5e95c0f) - Sat Mar 19, 2022 XX:25:25.205 XXXX - starting up ...
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.205] Max open files: 8192
Mar 19 XX:25:25 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:25:25.207] Loading database ...
Mar 19 XX:26:09 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:26:09.186] DB memory: 1024.00 MiB
Mar 19 XX:26:09 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:26:09.187] Coin: BTC
Mar 19 XX:26:09 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:26:09.188] Chain: main
Mar 19 XX:26:09 XXXX-ubuntu Fulcrum[3940]: [2022-03-19 XX:26:09.195] FATAL: Caught exception: It appears that Fulcrum was forcefully killed in the middle of committing a block to the db. We cannot figure out where exactly in the update process Fulcrum was killed, so we cannot undo the inconsistent state caused by the unexpected shutdown. Sorry!
Mar 19 XX:26:09 XXXX-ubuntu Fulcrum[3940]: The database has been corrupted. Please delete the datadir and resynch to bitcoind.

I was using this in fulcrum.conf:

bitcoind_timeout = 300
bitcoind_clients = 1
worker_threads = 1
db_mem=1024
db_max_open_files=200
fast-sync = 1024

Any suggestions?

apemithrandir · 2022-03-19T03:07:38Z

Maybe it is a problem with my VM, but twice now when I rebooted my machine after Fulcrum failed I have been booted into a initramfs prompts/busybox screen and had to manually perform fsck command to recover my file system.

cculianu · 2022-03-20T13:49:44Z

I am really sorry this is happening. So there must be a bug in either the jemalloc allocator Fulcrum uses or, alternatively, in the robin_hood::unordered_map (internal data structure used to store the UTXOs while synching). I suspect actually robin_hood may have issues since it has had some issues in the past. It's worrying that it takes memory and never gives it up... in some situations.

The strange thing is that I have synched recently on BTC just to test things out and I never observed this behavior. So it may be something specific that triggers it.

Yes, the more I think about it.. it could be that robin_hood is to blame. I will investigate this further. Thank you for the info.

cculianu · 2022-03-20T13:50:05Z

Follow-up: I predict without fast-sync it won't fail. This would be evidence that robin_hood is to blame since it's only used for synching.

apemithrandir · 2022-03-20T14:52:42Z

Ok. I assume commenting out the fast-sync line in my fulcrum.conf is how I run without fast-sync. I did run into my CPU maxing out during the last run I did, so I've set the worker_threads back to 1 from 2.
I'll give it one more go.
Also let me know if you want me to DM you more logs or anything else that you might need to bug hunt.
##########################################
Edit: I got this when I started it this time:
"<Controller> fast-sync: Not enabled"
So I will see how I get on.

apemithrandir · 2022-03-22T11:06:32Z

After over 2 days on the Sync (with fast sync disabled) at block height 653,000 my CPU locked up again. Since I had the worker_threads=1, the CPU locked at < 100%. The VM was still unresponsive though.

apemithrandir · 2022-03-24T02:54:26Z

Sorry to say, I was unable to get Fulcrum up and fully sync'd after a week of trying. My VM kept crashing or freezing during the re-sync, forcing me to do another re-sync.
I'm not willing to re-build my VM from scratch at the moment so I will have to settle for the less performant ElectrumX server for now.

cculianu · 2022-03-24T03:05:09Z

I'm sorry to hear that, @apemithrandir . I have had little trouble synching it even on old Windows 7 boxes (yes, there is a windows .exe available) with like 4GB of RAM and HDD. It's perplexing that it would fail on what sounds like more generous hardware. I'm just curious -- can you provide more details about your setup? Like host os, guest os, VM software, VM configuration, host machine specs, and relevant parts of config file, etc. Anything helps. I want to see if I can reproduce the issues you experienced.

Sorry to hear you are going :(.

What about running on bare metal outside a VM? Or using Docker?

apemithrandir · 2022-03-24T04:28:17Z

I'm sorry to hear that, @apemithrandir . I have had little trouble synching it even on old Windows 7 boxes (yes, there is a windows .exe available) with like 4GB of RAM and HDD. It's perplexing that it would fail on what sounds like more generous hardware. I'm just curious -- can you provide more details about your setup? Like host os, guest os, VM software, VM configuration, host machine specs, and relevant parts of config file, etc. Anything helps. I want to see if I can reproduce the issues you experienced.

Sorry to hear you are going :(.

What about running on bare metal outside a VM? Or using Docker?

Happy to share any and all details with you one on one over private message/email, if it might help you with development.

caheredia · 2022-05-21T20:14:51Z

Yeah it's a known issue with the way I did the data layout. I will have to redesign the data layout to avoid this in a future version. The recommended way to stop Fulcrum is to send it SIGINT and wait a good 60 seconds. (Usually it's done in 5-10s). See if you can configure systemd to send SIGINT or SIGTERM and have it wait for completion and not kill the process right away. I believe on most systems by default it does wait 30s or more...

You will have to resynch, unfortunately. :/ Sorry about that.

A future version will try to be ACID -- but for now I took speed shortcuts -- so hard shutdown runs the risk of this issue happening if you shut down in the middle of when a block arrived and the DB was being updated.

I understand that ElectrumX did not suffer from this. It was also slower. :)

I will see if I can do ACID without too much of a perf. hit in a future version. For now you will have to resynch from scratch though. Sorry...

If this makes you worried you can always also backup the synched DB (with Fulcrum stopped). That way you can always restore from backup. FWIW I have been running my server for months now and never had to restore from backup.

Sorry about that.

I just experienced the same thing. My VM rebooted for updates.

[2022-05-21 20:06:24.266] Coin: BTC

[2022-05-21 20:06:24.266] Chain: main

[2022-05-21 20:06:24.267] FATAL: Caught exception: It appears that Fulcrum was forcefully killed in the middle of committing a block to the db. We cannot figure out where exactly in the update process Fulcrum was killed, so we cannot undo the inconsistent state caused by the unexpected shutdown. Sorry!

The database has been corrupted. Please delete the datadir and resynch to bitcoind.

[2022-05-21 20:06:24.268] Stopping Controller ... 

[2022-05-21 20:06:24.268] Closing storage ...

[2022-05-21 20:06:24.341] Shutdown complete

cculianu · 2022-05-21T20:31:18Z

Yes, I'm sorry. This happens if it's killed while it's busy processing a block. Perhaps you should setup fulcrum to be a systemd service this way any reboot of the node will send it a SIGTERM or whatever to shutdown gracefully?

I'm sorry. You must resynch now...

caheredia · 2022-05-21T20:33:39Z

Yes, I'm sorry. This happens if it's killed while it's busy processing a block. Perhaps you should setup fulcrum to be a systemd service this way any reboot of the node will send it a SIGTERM or whatever to shutdown gracefully?

I'm sorry. You must resynch now...

I'm running it inside of a docker container, so I'll have to figure out a graceful exit strategy. I appreciate the reply.

cculianu · 2022-05-21T20:36:12Z

Yeah, I'm sorry. I will have to redo the data model to be fully ACID and then this can never happen. This is on the to-do list for Fulcrum 2.0. Sorry about that.

caheredia · 2022-05-21T20:51:20Z

Yeah, I'm sorry. I will have to redo the data model to be fully ACID and then this can never happen. This is on the to-do list for Fulcrum 2.0. Sorry about that.

Looking forward to it. Thanks for prioritizing it in the project. I'd offer to help, but I mostly code in python.

jonscoresby · 2022-07-26T22:17:10Z

Just wanted to say that I downloaded and synced fulcrum about a month ago and had syncing problems. I tried syncing fulcrum on bitcoin 3 times and it would crash somwhere after block 400,000 after running out of memory. I tried disabling autosuspend as suggested here, but the sync still failed. After disabling fast sync however, the sync was successful.

cculianu · 2022-07-26T23:46:44Z

Thanks for the feedback @jonscoresby ... Perhaps I need to go back and see if I can make --fast-sync more resilient to such conditions. Just curious: were you using swap at all or did you have swap disabled?

RequestPrivacy · 2022-08-19T18:04:25Z

Just wanted to swing by and report that I seem to have the same problem: I tried to index on a Raspberry Pi with 4GBs and it flooded my RAM up till the point that it exited with the above error message (first try with fast-sync = 1024, second try with fast-sync = 512 and as I noticed that it filled my RAM again set it to 200MBs. But it crashed once more.

So I disabled fast-sync and now its humming away slowly but steady at 2.7 GBs (baseload was something like 1.3GBs of RAM).

cculianu · 2022-08-19T19:39:39Z

Yeah I really need to set aside some time and have the fast-sync option auto-detect this situation on OS's such as linux that overcommit and prune it down if that happens. I definitely will work on this soon!

jonscoresby · 2022-08-19T22:24:03Z

Thanks for the feedback @jonscoresby ... Perhaps I need to go back and see if I can make --fast-sync more resilient to such conditions. Just curious: were you using swap at all or did you have swap disabled?

Sorry I didn't see this. I do not have a swap enabled.

cculianu · 2022-08-20T01:43:29Z

Ah I see. Thanks for getting back to me.

Yeah I have a hypothesis that this is more likely to happen in the "no swapfile" situation. I am not sure why it became fashionable to ship Linux installs these days with no swap. I remember a time when every Linux install had a default swapfile setup. At some point that changed. Anyway -- I think that in the no-swapfile case, memory usage can get out-of-hand temporarily with --fast-sync and rocksdb both gobbling up RAM. And, of course, if there's no swap.. when you are out of RAM .. something must die. And that thing is Fulcrum.

I can't fully control memory usage (because rocksdb lib does its own thing and sometimes overallocates memory temporarily even when you tell it not to). I can, however, mitigate this by detecting the situation and controlling the --fast-sync memory usage .. if it looks like we are reaching the system limit, I can just prune the cache temporarily to be smaller than what the user specified.. or something like that.

RequestPrivacy · 2022-08-20T08:15:20Z

Also no swapfile on my linux.

Let me know if I should test something once you might have figured out a solution.

chrisguida · 2023-01-18T14:36:58Z

Please, please, please fix this. We are trying to package Fulcrum for embassyOS and this makes the otherwise amazing experience very painful. It can take several days to build the index on a low-resource device in docker, and to be told that you have to do it all over again is enough to make the user want to simply delete it and switch back to electrs.

cculianu · 2023-01-18T17:59:59Z

I will fix in in a future release, that's the plan.

Please don't use --fast-sync that eats memory and is experimental. It's not really suited for systems with low memory and no swap. It shouldn't ever crash on initial synch as often as it does -- and I noticed everybody is using that option -- which probably is leading to OOM? I should have named it differently...

craigraw · 2023-01-19T09:44:01Z

I will fix in in a future release, that's the plan.

That's great to hear. I've also noticed that --fast-sync is often configured with values that are far too high for the system. Perhaps Fulcrum should warn if it's set to say > 20% of system memory?

That said, I do see this issue mentioned more frequently not for the initial sync, but for accidental power loss or other ungraceful shutdown conditions.

chrisguida · 2023-01-19T16:19:46Z

I will fix in in a future release, that's the plan.

Excellent, great to hear!

Please don't use --fast-sync that eats memory and is experimental.

This problem does not only present during initial sync. We have already experienced corrupted databases on a couple of devices that were already synced.

MattDHill · 2023-06-19T13:03:32Z

Any update on this issue and #155. Start9 is still very excited to get Fulcrum onto StartOS, but not as long as ungraceful shutdowns necessitate resyncs.

Is there any update on that issue as well as the issues related to "fast-sync" discussed above?

greenm01 · 2023-09-09T13:35:11Z

I lost power this morning and my Fulcrum database is now corrupted. It took several days to sync on my SSD. For the time being I will switch back to electrs until this issue is resolved.

fabiolameira · 2024-02-05T11:37:24Z

Hello 👋

I ran into this problem when trying to synchronize my Fulcrum Server. The process was consuming too much RAM until it was killed by the OOM Killer (Out of Memory killer), causing the program to be closed forcefully, and corrupting my fulcrum_db.

I tried with different settings in fulcum.conf:

fast-sync = 8192 | 4096 | 2048 | 1024 | 512
db_max_open_files = 400 | 200 | 100 | 50 | 40

And it always ended up failing and corrupting the db.

For context, this is my setup:
OS: Ubuntu Server 22.04.3 LTS
Processor: i5-6500
RAM: 16GB
Disk: 2TB SSD

I compiled Fulcrum myself following the instructions detailed in the project's README.md and I didn't understand why this was happening, as it's not the first time I've synchronized a Fulcrum Server and it's never happened to me before.

As on other occasions I used images already compiled from the project and this had never happened, I thought it must be related to the way I compiled the project.

It was then that I noticed this:

$ Fulcrum -v
Fulcrum 1.9.8 (Release d4b3fa1)
Protocol: version min: 1.4, version max: 1.5.2
compiled: gcc 11.4.0
jemalloc: unavailable
Qt: version 5.15.3
rocksdb: version 6.14.6-ed43161
simdjson: version 0.6.0
ssl: OpenSSL 3.0.2 Mar 15, 2021
zmq: libzmq version: 4.3.4, cppzmq version: 4.7.1

jemalloc is unavailable when I run the $ Fulcrum -v command.
Since there was no jemalloc installed on the system, the project was using the system memory allocator and not jemalloc. I immediately thought that the problem might be related to this, as the system allocator might not be able to manage RAM usage as expected.

To solve the problem, I installed jemalloc with the following command:

$ sudo apt update
$ sudo apt install libjemalloc-dev

I verified the installation by running:

$ pkg-config --modversion jemalloc

Then i verified if the flag for jemaloc exists by running:

$ pkg-config --cflags --libs jemalloc

This should return -ljemalloc

Then I recompiled the project. To do this, I ran the following commands:

# This will generate the Makefile linking our jemalloc
$ qmake LIBS+=-ljemalloc

This should return somethis like this:

Project MESSAGE: CLI overrides: LIBS=-ljemalloc
Project MESSAGE: ZMQ version: 4.3.4
Project MESSAGE: rocksdb: using static lib
Project MESSAGE: jemalloc: using CLI override
Project MESSAGE: Including embedded secp256k1
Project MESSAGE: Installation dir prefix is /usr/local

Then i run the following command to execute the Makefile:

# This will execute the Makefile with the number of cores available on your machine
$ make -j $(nproc)

Then just run:

# This will install the Fulcrum in you /usr/local/bin
$ make install

Finally, to check if jemalloc is being used by Fulcrum, run this command again:

$ Fulcrum -v

And you should see something like:

Fulcrum 1.9.8 (Release d4b3fa1)
Protocol: version min: 1.4, version max: 1.5.2
compiled: gcc 11.4.0
jemalloc: version 5.2.1-0-gea6b3e9
Qt: version 5.15.3
rocksdb: version 6.14.6-ed43161
simdjson: version 0.6.0
ssl: OpenSSL 3.0.2 Mar 15, 2021
zmq: libzmq version: 4.3.4, cppzmq version: 4.7.1

Since my Fulcrum installation is using jemalloc as a memory allocator, I never had any more problems with OOM Killer again, neither during synchronization nor during normal use after it was synchronized.

Hope this helps 🙏

cculianu added the question Further information is requested label Jul 25, 2020

This was referenced Jun 15, 2022

FATAL: Caught exception: File size is not a multiple of recordSize #122

Closed

Fulcrum database corruption #115

Closed

EchterAgo mentioned this issue Mar 9, 2023

"FATAL: Caught exception: File size is not a multiple of recordSize" #155

Closed

Fredvs79 mentioned this issue May 15, 2024

FATAL: Caught exception: File size is not a multiple of recordSize #250

Open

FATAL: Caught exception: It appears that Fulcrum was forcefully killed in the middle of committing a block... #41

FATAL: Caught exception: It appears that Fulcrum was forcefully killed in the middle of committing a block... #41

Comments

ghost commented Jul 25, 2020

cculianu commented Jul 25, 2020 • edited Loading

ghost commented Jul 25, 2020

georgengelmann commented Jul 25, 2020

cculianu commented Jul 26, 2020 • edited Loading

vul-ture commented Oct 3, 2020

cculianu commented Oct 3, 2020

vul-ture commented Oct 3, 2020 • edited Loading

cculianu commented Oct 3, 2020

cculianu commented Oct 3, 2020 • edited Loading

vul-ture commented Oct 3, 2020

cculianu commented Oct 3, 2020

vul-ture commented Oct 3, 2020

cculianu commented Oct 3, 2020

vul-ture commented Oct 4, 2020

cculianu commented Oct 4, 2020 • edited Loading

vul-ture commented Oct 4, 2020 • edited Loading

cculianu commented Nov 4, 2020

vul-ture commented Nov 14, 2020

cculianu commented Nov 14, 2020 • edited Loading

vul-ture commented Jan 20, 2021

cculianu commented Jan 21, 2021

cculianu commented Jan 21, 2021

apemithrandir commented Mar 17, 2022

cculianu commented Mar 17, 2022

apemithrandir commented Mar 18, 2022

craigraw commented Mar 18, 2022

cculianu commented Mar 18, 2022

apemithrandir commented Mar 19, 2022 • edited Loading

apemithrandir commented Mar 19, 2022

cculianu commented Mar 20, 2022

cculianu commented Mar 20, 2022

apemithrandir commented Mar 20, 2022 • edited Loading

apemithrandir commented Mar 22, 2022

apemithrandir commented Mar 24, 2022

cculianu commented Mar 24, 2022

apemithrandir commented Mar 24, 2022 • edited Loading

caheredia commented May 21, 2022

cculianu commented May 21, 2022

caheredia commented May 21, 2022 • edited Loading

cculianu commented May 21, 2022

caheredia commented May 21, 2022

jonscoresby commented Jul 26, 2022

cculianu commented Jul 26, 2022

RequestPrivacy commented Aug 19, 2022

cculianu commented Aug 19, 2022

jonscoresby commented Aug 19, 2022

cculianu commented Aug 20, 2022 • edited Loading

RequestPrivacy commented Aug 20, 2022

chrisguida commented Jan 18, 2023

cculianu commented Jan 18, 2023 • edited Loading

craigraw commented Jan 19, 2023

chrisguida commented Jan 19, 2023

MattDHill commented Jun 19, 2023

greenm01 commented Sep 9, 2023

fabiolameira commented Feb 5, 2024

cculianu commented Jul 25, 2020 •

edited

Loading

cculianu commented Jul 26, 2020 •

edited

Loading

vul-ture commented Oct 3, 2020 •

edited

Loading

cculianu commented Oct 3, 2020 •

edited

Loading

cculianu commented Oct 4, 2020 •

edited

Loading

vul-ture commented Oct 4, 2020 •

edited

Loading

cculianu commented Nov 14, 2020 •

edited

Loading

apemithrandir commented Mar 19, 2022 •

edited

Loading

apemithrandir commented Mar 20, 2022 •

edited

Loading

apemithrandir commented Mar 24, 2022 •

edited

Loading

caheredia commented May 21, 2022 •

edited

Loading

cculianu commented Aug 20, 2022 •

edited

Loading

cculianu commented Jan 18, 2023 •

edited

Loading