Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[upstream] Eventcore bug with collection during allocation #1367

Closed
kchulj opened this issue Nov 16, 2020 · 20 comments
Closed

[upstream] Eventcore bug with collection during allocation #1367

kchulj opened this issue Nov 16, 2020 · 20 comments
Assignees
Labels
C.Tools An issue related to tooling / external programs Story-Points:13 This takes > 7 days to complete type-bug Things don't work as they were intended to, or the way they were intended to work doesn't make sense

Comments

@kchulj
Copy link

kchulj commented Nov 16, 2020

It happened at Height 51.

Nov 16 00:09:00 eu-002 docker[9750]: core.exception.InvalidMemoryOperationError@core/exception.d(647): Invalid memory operation
Nov 16 00:09:00 eu-002 docker[9750]: ----------------
Nov 16 00:09:01 eu-002 systemd[1]: [email protected]: Main process exited, code=exited, status=139/n/a
Nov 16 00:09:01 eu-002 systemd[1]: [email protected]: Failed with result 'exit-code'.
Nov 16 00:09:01 eu-002 systemd[1]: [email protected]: Service hold-off time over, scheduling restart.
Nov 16 00:09:01 eu-002 systemd[1]: [email protected]: Scheduled restart job, restart counter is at 1.
Nov 16 00:09:01 eu-002 systemd[1]: Stopped Agora node 2.
Nov 16 00:09:01 eu-002 systemd[1]: Started Agora node 2.
Nov 16 00:09:04 eu-002 docker[7106]: 2020-11-16 00:09:04,81 Info [agora.common.TransactionPool] - Loading database from: .cache/tx_pool.dat
Nov 16 00:09:04 eu-002 docker[7106]: 2020-11-16 00:09:04,81 Info [agora.common.TransactionPool] - Loaded database from: .cache/tx_pool.dat
Nov 16 00:09:04 eu-002 docker[7106]: 2020-11-16 00:09:04,82 Info [agora.consensus.state.UTXODB] - Loading UTXO database from: .cache/utxo_set.dat
Nov 16 00:09:04 eu-002 docker[7106]: 2020-11-16 00:09:04,82 Info [agora.consensus.state.UTXODB] - Loaded database from: .cache/utxo_set.dat
Nov 16 00:09:04 eu-002 docker[7106]: 2020-11-16 00:09:04,91 Info [SCP] - LocalNode::LocalNode@GDNOD qSet: 204980
Nov 16 00:09:04 eu-002 docker[7106]: 2020-11-16 00:09:04,91 Info [agora.consensus.SCPEnvelopeStore] - Loading database from: .cache/scp_envelopes.dat
Nov 16 00:09:04 eu-002 docker[7106]: 2020-11-16 00:09:04,92 Fatal [agora.consensus.EnrollmentManager] - No preimage at height 0 for validator key 0x46883e83778481d640
Nov 16 00:09:04 eu-002 docker[7106]: core.exception.AssertError@source/agora/consensus/EnrollmentManager.d(553): No preimage at expected height
Nov 16 00:09:05 eu-002 systemd[1]: [email protected]: Main process exited, code=exited, status=139/n/a
Nov 16 00:09:05 eu-002 systemd[1]: [email protected]: Failed with result 'exit-code'.
. . .
Nov 16 00:09:20 eu-002 systemd[1]: [email protected]: Service hold-off time over, scheduling restart.
Nov 16 00:09:20 eu-002 systemd[1]: [email protected]: Scheduled restart job, restart counter is at 6.
Nov 16 00:09:20 eu-002 systemd[1]: Stopped Agora node 2.
Nov 16 00:09:20 eu-002 systemd[1]: [email protected]: Start request repeated too quickly.
Nov 16 00:09:20 eu-002 systemd[1]: [email protected]: Failed with result 'exit-code'.
Nov 16 00:09:20 eu-002 systemd[1]: Failed to start Agora node 2.
@kchulj kchulj added the type-bug Things don't work as they were intended to, or the way they were intended to work doesn't make sense label Nov 16, 2020
@AndrejMitrovic
Copy link
Contributor

Nov 16 00:09:04 eu-002 docker[7106]: core.exception.AssertError@source/agora/consensus/EnrollmentManager.d(553): No preimage at expected height

This is your clue.

The preimages are not sent fast enough but faucet is making transactions as fast as possible. So it will lead to this scenario.

The real issue here is that we need to implement slashing (#1076), and also do something about missing preimages.

Maybe this is also related to the freezing issue. Have you checked if any other nodes have crashed (not just eu-002) when a freeze happens? We need to inspect the logs of all the nodes.

@kchulj
Copy link
Author

kchulj commented Nov 16, 2020

That makes sense. The nodes create many blocks in a short time due to a large pool of UTXOs. This leads to the faucet falling behind and trying to catch up.

All the nodes are crashing, so it might be related to freezing.

@AndrejMitrovic
Copy link
Contributor

More specifically:

When a validator is slashed because of a missing preimage, we will also have to exclude that validator's preimage when generating the sum random number: https://github.com/bpfkorea/agora/blob/06b3d433c9105dc77e1d992baf017b368315f96c/source/agora/consensus/EnrollmentManager.d#L550

But we don't have this detection yet until #1076 is implemented.

Another possible issue here is that the validators are not sending their preimages fast enough. I suspect this can happen around the time a validator is about to re-enroll. It only re-enrolls one block before expiration: https://github.com/bpfkorea/agora/blob/06b3d433c9105dc77e1d992baf017b368315f96c/source/agora/node/Validator.d#L453

Normally one block should be enough time (it's 10 minutes), however in our test system the genesis_start_time is set in the past, which means the nodes are generating blocks as fast as possible, not every 10 minutes.

So I suggest the following:

  • Change genesis_start_time in the test system configs to always start close to the wall clock time (and they should all have the exact same value, no drifting)
  • Change block_interval_sec to be something smaller than 10 minutes, maybe 5 seconds would be enough.

@AndrejMitrovic
Copy link
Contributor

AndrejMitrovic commented Nov 16, 2020

Change genesis_start_time in the test system configs to always start close to the wall clock time (and they should all have the exact same value, no drifting)

This will be a little harder to implement. But what could be done is to get the local clock time (on your PC) with ansible, and then change the configs' genesis_start_time when reset is issued.

I suspect this will be difficult to do.

Then there is the other alternative: change Agora so it doesn't try to catch-up and generate too many blocks at once. Unfortunately there are still many design decisions that haven't been resolved, so this kind of fix probably isn't future-proof.

@AndrejMitrovic
Copy link
Contributor

change Agora so it doesn't try to catch-up and generate too many blocks at once. Unfortunately there are still many design decisions that haven't been resolved, so this kind of fix probably isn't future-proof.

On the other hand making the nodes generate blocks too fast can become a problem in real-world scenarios. For example, after prolonged network outages. So maybe we really should change Agora to not generate blocks too fast.

@hewison-chris
Copy link
Contributor

I think if we upgrade to the latest build the situation may improve due to the fact that more than 8 transactions will be allowed in a block. However we may run into a different problem where the block size is too big.

@AndrejMitrovic
Copy link
Contributor

@kchulj already upgraded, the problem persists.

@bpalaggi bpalaggi added the C.Tools An issue related to tooling / external programs label Nov 17, 2020
@kchulj
Copy link
Author

kchulj commented Nov 17, 2020

Another possible issue here is that the validators are not sending their preimages fast enough. I suspect this can happen around the time a validator is about to re-enroll. It only re-enrolls one block before expiration:

I've tested with the latest version and it seems like this is the issue. There's always a problem with the pre-image at Height 21 and the error is reproducible. Thoughts?

@AndrejMitrovic
Copy link
Contributor

Thoughts?

Well it depends what we want to do. Do we implement a workaround until #1076 is implemented? You could for example make faucet poll for the preimages until you get a preimage with the right distance (using getPreimage()). And then it would be safe to send transactions.

@hewison-chris
Copy link
Contributor

Perhaps Agora could wait at least a few seconds after externalising a block before starting the next one. That way the nodes should have enough time to reveal their pre-images. We already have this wait for pre-images in the unit tests and it is not realistic. Even if #1076 is implemented we need to give fair time for the nodes to reveal.

@kchulj
Copy link
Author

kchulj commented Nov 17, 2020

Both of the ideas sound good. The timing definitely needs improvement, because I'm observing the "catch-up gap" not getting smaller over time. So should the waiting be on the faucet or the agora side?

@hewison-chris
Copy link
Contributor

hewison-chris commented Nov 17, 2020

Another option is to let the catch up happen with a single transaction from the Faucet. We do this in the ci system test now. Agora should then externalise the blocks at the correct interval from then on and you can start sending many transactions. You just need to wait for the first block to be created.

@kchulj
Copy link
Author

kchulj commented Nov 17, 2020

Change genesis_start_time in the test system configs to always start close to the wall clock time (and they should all have the exact same value, no drifting)
Change block_interval_sec to be something smaller than 10 minutes, maybe 5 seconds would be enough.

I tried some workarounds and it seems like this is simple and feasible.

@kchulj
Copy link
Author

kchulj commented Nov 18, 2020

Based on running the test several times, the Invalid memory operation exception happens at a point of enrollment, after the node enrolls. And it seems like it's not triggered by the missing pre-image, but rather prevents the node from revealing a pre-image.

Nov 18 01:40:50 eu-002 docker[31406]: 2020-11-18 01:40:50,881 Info [agora.consensus.protocol.Nominator] - Externalized consensus data set at
Nov 18 01:40:51 eu-002 docker[31406]: 2020-11-18 01:40:51,353 Info [agora.node.FullNode] - Accepted enrollment: { utxo: 0x4688...a3ac, seed:
Nov 18 01:40:51 eu-002 docker[31406]: 2020-11-18 01:40:51,504 Info [agora.node.FullNode] - Accepted enrollment: { utxo: 0x8c15...70e0, seed:
Nov 18 01:40:51 eu-002 docker[31406]: 2020-11-18 01:40:51,647 Info [agora.node.FullNode] - Accepted enrollment: { utxo: 0x4dde...e6ef, seed:
Nov 18 01:40:51 eu-002 docker[31406]: 2020-11-18 01:40:51,654 Info [agora.network.NetworkManager] - Received blocks [19..19]
Nov 18 01:40:51 eu-002 docker[31406]: 2020-11-18 01:40:51,806 Info [agora.node.FullNode] - Accepted enrollment: { utxo: 0x9490...85b0, seed:
Nov 18 01:40:51 eu-002 docker[31406]: 2020-11-18 01:40:51,908 Info [agora.node.FullNode] - Accepted enrollment: { utxo: 0xb20d...08eb, seed:
Nov 18 01:40:52 eu-002 docker[31406]: 2020-11-18 01:40:52,32 Info [agora.node.FullNode] - Accepted enrollment: { utxo: 0xdb39...2d85, seed:
Nov 18 01:40:57 eu-002 docker[31406]: 2020-11-18 01:40:57,558 Info [agora.node.FullNode] - Accepted transaction: Type : Payment, Inputs (1):
Nov 18 01:40:57 eu-002 docker[31406]: Outputs (1): GDAC...5K2J(2,440,000) (0x5ee5e57137e7bf9253ed369115ebceb6aea5ad706aa8b17758d072643593bd9
Nov 18 01:40:57 eu-002 docker[31406]: 2020-11-18 01:40:57,617 Info [agora.node.FullNode] - Accepted transaction: Type : Payment, Inputs (1):
Nov 18 01:40:57 eu-002 docker[31406]: Outputs (1): GDAG...3YJC(2,440,000) (0x4ef070e4e8472be5a955ebf458a6bb1db5afaac15aee59930e3a76e0d20f97d
Nov 18 01:40:57 eu-002 docker[31406]: 2020-11-18 01:40:57,675 Info [agora.node.FullNode] - Accepted transaction: Type : Payment, Inputs (1):
Nov 18 01:40:57 eu-002 docker[31406]: Outputs (1): GDAR...T4CX(162,666.6,666,666) (0xb8948895a9365d54ba0898c097b49ad933008be860a5b86d5f43b99
Nov 18 01:40:57 eu-002 docker[31406]: 2020-11-18 01:40:57,776 Info [agora.node.FullNode] - Accepted transaction: Type : Payment, Inputs (1):
Nov 18 01:40:57 eu-002 docker[31406]: Outputs (1): GDAD...HZDA(2,440,000) (0x462991b2fa88e3380be70d005f9ed6343371ac74fbd741f750681e15da81382
Nov 18 01:40:57 eu-002 docker[31406]: 2020-11-18 01:40:57,917 Info [agora.node.FullNode] - Accepted transaction: Type : Payment, Inputs (1):
Nov 18 01:40:57 eu-002 docker[31406]: Outputs (1): GDAQ...WPA7(162,666.6,666,666) (0x4d19dd7789670d22294d00ff995b071e3a9d1640e2765301a73a66d
Nov 18 01:40:57 eu-002 docker[31406]: 2020-11-18 01:40:57,983 Info [agora.node.FullNode] - Accepted transaction: Type : Payment, Inputs (1):
Nov 18 01:40:57 eu-002 docker[31406]: Outputs (1): GDAM...IMLI(162,666.6,666,666) (0x0a921c53680c46c4f5d95b9676ad43ca8880a36fa05fc1ba08a668e
Nov 18 01:41:19 eu-002 docker[31406]: 2020-11-18 01:41:19,64 Info [agora.consensus.protocol.Nominator] - Nomination started at 1605663679
Nov 18 01:41:19 eu-002 docker[31406]: 2020-11-18 01:41:19,64 Info [agora.consensus.protocol.Nominator] - agora.consensus.protocol.Nominator.
Nov 18 01:41:19 eu-002 docker[31406]: 2020-11-18 01:41:19,84 Info [agora.consensus.protocol.Nominator] - agora.consensus.protocol.Nominator.
Nov 18 01:41:20 eu-002 docker[31406]: 2020-11-18 01:41:20,32 Info [agora.consensus.protocol.Nominator] - combineCandidates: 20
Nov 18 01:41:20 eu-002 docker[31406]: 2020-11-18 01:41:20,156 Info [agora.consensus.protocol.Nominator] - combineCandidates: 20
Nov 18 01:41:20 eu-002 docker[31406]: core.exception.InvalidMemoryOperationError@core/exception.d(647): Invalid memory operation
Nov 18 01:41:20 eu-002 docker[31406]: ----------------
Nov 18 01:41:22 eu-002 systemd[1]: [email protected]: Main process exited, code=exited, status=139/n/a
Nov 18 01:41:22 eu-002 systemd[1]: [email protected]: Failed with result 'exit-code'.
Nov 18 01:41:22 eu-002 systemd[1]: [email protected]: Service hold-off time over, scheduling restart.
Nov 18 01:41:22 eu-002 systemd[1]: [email protected]: Scheduled restart job, restart counter is at 2.
Nov 18 01:41:22 eu-002 systemd[1]: Stopped Agora node 2.
Nov 18 01:41:22 eu-002 systemd[1]: Started Agora node 2.
Nov 18 01:41:25 eu-002 docker[13431]: 2020-11-18 01:41:25,170 Info [agora.common.TransactionPool] - Loading database from: .cache/tx_pool.da
Nov 18 01:41:25 eu-002 docker[13431]: 2020-11-18 01:41:25,171 Info [agora.common.TransactionPool] - Loaded database from: .cache/tx_pool.dat
Nov 18 01:41:25 eu-002 docker[13431]: 2020-11-18 01:41:25,171 Info [agora.consensus.state.UTXODB] - Loading UTXO database from: .cache/utxo_
Nov 18 01:41:25 eu-002 docker[13431]: 2020-11-18 01:41:25,171 Info [agora.consensus.state.UTXODB] - Loaded database from: .cache/utxo_set.da
Nov 18 01:41:25 eu-002 docker[13431]: 2020-11-18 01:41:25,633 Info [SCP] - LocalNode::LocalNode@GDNOD qSet: 204980
Nov 18 01:41:25 eu-002 docker[13431]: 2020-11-18 01:41:25,634 Info [agora.consensus.SCPEnvelopeStore] - Loading database from: .cache/scp_en

@kchulj kchulj added the status-blocked Another issue/PR/external dependency is blocking completion of this PR/issue label Nov 24, 2020
@AndrejMitrovic AndrejMitrovic added this to the 2. Validator milestone Dec 1, 2020
@AndrejMitrovic
Copy link
Contributor

AndrejMitrovic commented Dec 1, 2020

We finally got the backtrace: https://gist.github.com/AndrejMitrovic/4bbbdc7e66dbade63f4fdd02570b0427

Also pasted below:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.5
  * frame #0: 0x0000000100971d58 agora`_D4core9exception__T11staticErrorTCQBhQBf27InvalidMemoryOperationErrorZQBzFNaNbNiZQBw at exception.d:646:5
    frame #1: 0x0000000100971d41 agora`onInvalidMemoryOperationError(pretend_sideffect=0x0000000000000000) at exception.d:539:5
    frame #2: 0x000000010098733d agora`_D2gc4impl12conservativeQw14ConservativeGC6lockNRFNbNiZv at gc.d:162:13
    frame #3: 0x0000000100988501 agora`_D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQEgQEgQEeQEp10mallocTimelS_DQFiQFiQFgQFr10numMallocslTmTkTmTxQCzZQFcMFNbKmKkKmKxQDsZQDl(this=0x0000000102f050c0, _param_0=0x000000010ea03610, _param_1=0x000000010ea0360c, _param_2=0x000000010ea035f8, _param_3=0x000000010ea03600) at gc.d:247:9
    frame #4: 0x0000000100988827 agora`_D2gc4impl12conservativeQw14ConservativeGC6callocMFNbmkxC8TypeInfoZPv(this=0x0000000102f050c0, size=2048, bits=16, ti=0x0000000000000000) at gc.d:438:9
    frame #5: 0x00000001009987ea agora`gc_calloc(sz=2048, ba=16, ti=0x0000000000000000) at proxy.d:184:9
    frame #6: 0x000000010097b133 agora`_D4core6memory2GC6callocFNaNbmkxC8TypeInfoZPv(sz=2048, ba=16, ti=0x0000000000000000) at memory.d:551:9
    frame #7: 0x000000010099ffba agora`_D2rt3aaA12allocBucketsFNaNbNemZASQBgQBg6Bucket(dim=128) at aaA.d:197:5
    frame #8: 0x00000001009a077b agora`_D2rt3aaA4Impl6resizeMFNaNbmZv(this=0x0000000000000200, ndim=128) at aaA.d:145:9
    frame #9: 0x00000001009a08a4 agora`_D2rt3aaA4Impl6shrinkMFMxC8TypeInfoZv(this=0x0000000000000200, keyti=0x0000000100decda0) at aaA.d:139:13
    frame #10: 0x00000001009a1ac9 agora`_aaDelX(aa=AA @ 0x000000010ea03788, keyti=0x0000000100decda0, pkey="�\f") at aaA.d:632:13
    frame #11: 0x000000010074c549 agora`_D9eventcore7drivers5timer22LoopTimeoutTimerDriver10releaseRefMFNbNfSQCp6driver7TimerIDZb + 361
    frame #12: 0x000000010074d277 agora`_D9eventcore7drivers5timer22LoopTimeoutTimerDriver10cancelWaitMFNbNfSQCp6driver7TimerIDZv + 183
    frame #13: 0x000000010074c4e7 agora`_D9eventcore7drivers5timer22LoopTimeoutTimerDriver10releaseRefMFNbNfSQCp6driver7TimerIDZb + 263
    frame #14: 0x00000001006b23a9 agora`_D4vibe4core8internal7release__T13releaseHandleVAyaa6_74696d657273TS9eventcore6driver7TimerIDZQCkFNbNfQBjOCQBn7drivers5posixQBu__T16PosixEventDriverHTCQDfQBsQBn9cfrunloop18CFRunLoopEventLoopZQCjZv + 105
    frame #15: 0x00000001006a8447 agora`_D4vibe4coreQf5Timer6__dtorMFNbNfZv + 71
    frame #16: 0x00000001000dc7ee agora`_D5agora6common4Task10VibedTimer11__fieldDtorMFNbNeZv + 30
    frame #17: 0x00000001009af88b agora`rt_finalize2(p=" ��", det=false, resetMemory=false) at lifetime.d:1489:21
    frame #18: 0x00000001009af9a0 agora`rt_finalizeFromGC(p=" ��", size=48, attr=1) at lifetime.d:1523:9
    frame #19: 0x0000000100990d55 agora`_D2gc4impl12conservativeQw3Gcx5sweepMFNbZm(this=0x0000000000000000) at gc.d:2474:41
    frame #20: 0x000000010098b3c5 agora`_D2gc4impl12conservativeQw3Gcx11fullcollectMFNbbZm(this=0x0000000000000000, nostack=false) at gc.d:2667:13
    frame #21: 0x000000010098d3aa agora`_D2gc4impl12conservativeQw3Gcx10smallAllocMFNbmKmkxC8TypeInfoZPv(this=0x0000000000000000, size=1632, alloc_size=0x000000010ea03fd8, bits=2, ti=0x0000000000000000) at gc.d:1678:17
    frame #22: 0x000000010098869d agora`_D2gc4impl12conservativeQw3Gcx5allocMFNbmKmkxC8TypeInfoZPv(this=0x0000000000000000, size=1632, alloc_size=0x000000010ea03fd8, bits=2, ti=0x0000000000000000) at gc.d:1634:9
    frame #23: 0x00000001009885dd agora`_D2gc4impl12conservativeQw14ConservativeGC12mallocNoSyncMFNbmkKmxC8TypeInfoZPv(this=0x0000000102f050c0, size=1632, bits=2, alloc_size=0x000000010ea03fd8, ti=0x0000000000000000) at gc.d:389:9
    frame #24: 0x0000000100988526 agora`_D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQEgQEgQEeQEp10mallocTimelS_DQFiQFiQFgQFr10numMallocslTmTkTmTxQCzZQFcMFNbKmKkKmKxQDsZQDl(this=0x0000000102f050c0, _param_0=0x000000010ea04000, _param_1=0x000000010ea03ffc, _param_2=0x000000010ea03fd8, _param_3=0x000000010ea03ff0) at gc.d:254:13
    frame #25: 0x0000000100988782 agora`_D2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkxC8TypeInfoZS4core6memory8BlkInfo_(this=0x0000000102f050c0, size=1632, bits=2, ti=0x0000000000000000) at gc.d:417:9
    frame #26: 0x000000010098beab agora`_DThn16_2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkxC8TypeInfoZS4core6memory8BlkInfo_ at gc.d:407:13
    frame #27: 0x000000010099877a agora`gc_qalloc(sz=1632, ba=2, ti=0x0000000000000000) at proxy.d:179:9
    frame #28: 0x000000010097b0fa agora`_D4core6memory2GC6qallocFNaNbmkxC8TypeInfoZSQBqQBo8BlkInfo_(sz=1632, ba=2, ti=0x0000000000000000) at memory.d:524:9
    frame #29: 0x00000001007cba14 agora`_D3std5array__T8AppenderTAyaZQo13ensureAddableMFmZ9__lambda3MFNaNbNeZS4core6memory8BlkInfo_ at array.d:3386:39
    frame #30: 0x00000001007cb6e2 agora`_D3std5array__T8AppenderTAyaZQo13ensureAddableMFNaNbNfmZv(this=0x00000001226fed00, nelems=1) at array.d:3386:13
    frame #31: 0x00000001001b1956 agora`_D3std5array__T8AppenderTAyaZQo__T3putTaZQhMFNaNbNfaZv + 38
    frame #32: 0x000000010079e458 agora`_D4vibe4data4json__T15writeJsonStringTS3std5array__T8AppenderTAyaZQoVbi0ZQCbFNfKQBqISQDfQDdQDb4JsonmZv + 1288
    frame #33: 0x000000010079e436 agora`_D4vibe4data4json__T15writeJsonStringTS3std5array__T8AppenderTAyaZQoVbi0ZQCbFNfKQBqISQDfQDdQDb4JsonmZv + 1254
    frame #34: 0x000000010079e436 agora`_D4vibe4data4json__T15writeJsonStringTS3std5array__T8AppenderTAyaZQoVbi0ZQCbFNfKQBqISQDfQDdQDb4JsonmZv + 1254
    frame #35: 0x000000010079e436 agora`_D4vibe4data4json__T15writeJsonStringTS3std5array__T8AppenderTAyaZQoVbi0ZQCbFNfKQBqISQDfQDdQDb4JsonmZv + 1254
    frame #36: 0x000000010079bd88 agora`_D4vibe4data4json4Json8toStringMxFNeZAya + 136
    frame #37: 0x0000000100048bb1 agora`_D4scpd5types11Stellar_SCP12SCPStatement10_pledges_t8toStringMxFNeZAya + 609
    frame #38: 0x00000001001e36e1 agora`_D4vibe4data13serialization__T18serializeValueImplTSQByQBw4json14JsonSerializerSQDaQCyQCw13DefaultPolicyZ__T21serializeValueDeducedTS4scpd5types11Stellar_SCP12SCPStatement10_pledges_tZQCyFNfKQFkKQClZv + 17
    frame #39: 0x00000001001e2919 agora`_D4vibe4data13serialization__T18serializeValueImplTSQByQBw4json14JsonSerializerSQDaQCyQCw13DefaultPolicyZ__T14serializeValueTS4scpd5types11Stellar_SCP12SCPStatement10_pledges_tZQCrFNfKQFdKQClZv + 9
    frame #40: 0x00000001001e263d agora`_D4vibe4data13serialization__T18serializeValueImplTSQByQBw4json14JsonSerializerSQDaQCyQCw13DefaultPolicyZ__T21serializeValueDeducedTS4scpd5types11Stellar_SCP12SCPStatementZQCmFNfKQEyKQBzZv + 317
    frame #41: 0x00000001001e21d9 agora`_D4vibe4data13serialization__T18serializeValueImplTSQByQBw4json14JsonSerializerSQDaQCyQCw13DefaultPolicyZ__T14serializeValueTS4scpd5types11Stellar_SCP12SCPStatementZQCfFNfKQErKQBzZv + 9
    frame #42: 0x00000001001e206d agora`_D4vibe4data13serialization__T18serializeValueImplTSQByQBw4json14JsonSerializerSQDaQCyQCw13DefaultPolicyZ__T21serializeValueDeducedTS4scpd5types11Stellar_SCP11SCPEnvelopeZQClFNfKQExKQByZv + 125
    frame #43: 0x00000001001e1fe9 agora`_D4vibe4data13serialization__T18serializeValueImplTSQByQBw4json14JsonSerializerSQDaQCyQCw13DefaultPolicyZ__T14serializeValueTS4scpd5types11Stellar_SCP11SCPEnvelopeZQCeFNfKQEqKQByZv + 9
    frame #44: 0x00000001001e1fd9 agora`_D4vibe4data13serialization__T19serializeWithPolicyTSQBzQBx4json14JsonSerializerSQDbQCzQCx13DefaultPolicyTS4scpd5types11Stellar_SCP11SCPEnvelopeZQElFNfKQDwKQByZv + 9
    frame #45: 0x00000001001e1fc9 agora`_D4vibe4data13serialization__T9serializeTSQBoQBm4json14JsonSerializerTS4scpd5types11Stellar_SCP11SCPEnvelopeZQDbFNfKQCxKQByZv + 9
    frame #46: 0x00000001001e1fa2 agora`_D4vibe4data13serialization__T9serializeTSQBoQBm4json14JsonSerializerTS4scpd5types11Stellar_SCP11SCPEnvelopeZQDbFNfKQBuZSQEpQEnQDb4Json + 66
    frame #47: 0x00000001001e1f54 agora`_D4vibe4data4json__T15serializeToJsonTS4scpd5types11Stellar_SCP11SCPEnvelopeZQCfFNfKQBuZSQDjQDhQDf4Json + 20
    frame #48: 0x00000001001c999f agora`_D4vibe3web4rest__T19executeClientMethodTC5agora3api9Validator3APIVii1S_DQCtQCrQCq__T19RestInterfaceClientTQCoZQBa9__mixin2215receiveEnvelopeMFNfS4scpd5types11Stellar_SCP11SCPEnvelopeZ8envelopeQBwZQGwMFNfMKxSQHyQHw8internalQIe6common__T13RestInterfaceTQIdZQuMDFNfCQKc4http6client17HTTPClientRequestZvMDFNfQBqMCQLw4core6stream11InputStreamZvZv + 399
    frame #49: 0x00000001001c97fb agora`_D4vibe3web4rest__T19RestInterfaceClientTC5agora3api9Validator3APIZQBw9__mixin2215receiveEnvelopeMFNfS4scpd5types11Stellar_SCP11SCPEnvelopeZv + 123
    frame #50: 0x00000001000b50e5 agora`_D5agora7network13NetworkClientQp__T14attemptRequestS_DQCb3api9Validator3API15receiveEnvelopeMFNfS4scpd5types11Stellar_SCP11SCPEnvelopeZvVEQFhQFeQEzQFc5Throwi0VE5ocean4util3log7ILoggerQi5Leveli0TCQHmQFlQFkQFdTQEiZQGvMFNbQzKQEwAyakZv + 133
    frame #51: 0x00000001000b505a agora`_D5agora7network13NetworkClientQp12sendEnvelopeMFNbS4scpd5types11Stellar_SCP11SCPEnvelopeZ9__lambda2MFZv + 58
    frame #52: 0x00000001006a8734 agora`_D4vibe4coreQf8setTimerFNbSQu4time8DurationDFZvbZ9__lambda4MFNbNfZv + 20
    frame #53: 0x000000010020199c agora`_D4vibe4core4task12TaskFuncInfo__T3setTDFNfZvZQmMFKQmZ12callDelegateFKSQCrQCpQCnQClZv + 124
    frame #54: 0x00000001006eb212 agora`_D4vibe4core4task12TaskFuncInfo4callMFZv + 18
    frame #55: 0x00000001006eab3b agora`_D4vibe4core4task9TaskFiber3runMFNbZv + 523
    frame #56: 0x000000010097ea7f agora`_D4core6thread7context8Callable6opCallMFZv(this=0x0000000000000002) at context.d:46:17
    frame #57: 0x000000010097ec0e agora`_D4core6thread5fiber5Fiber3runMFZv(this=0x000000010403c200) at fiber.d:1951:9
    frame #58: 0x000000010097eb43 agora`fiber_entryPoint at fiber.d:190:13

@Geod24
Copy link
Collaborator

Geod24 commented Dec 1, 2020

import core.memory;

size_t[size_t] glob;

class Foo
{
    size_t count;

    this (size_t entries) @safe
    {
        this.count = entries;
        foreach (idx; 0 .. entries)
            glob[idx] = idx;
    }

    ~this () @safe
    {
        foreach (idx; 0 .. this.count)
            glob.remove(idx);
    }
}

void main ()
{
    bar();
    GC.collect(); // Needs to happen from a GC collection
}

void bar () @safe
{
    Foo f = new Foo(16);
}

I guess we need to patch eventcore not to use AA.

@Geod24
Copy link
Collaborator

Geod24 commented Dec 1, 2020

@Geod24
Copy link
Collaborator

Geod24 commented Dec 1, 2020

vibe-d/eventcore#166

AndrejMitrovic added a commit to AndrejMitrovic/agora that referenced this issue Dec 2, 2020
This uses our own branch which includes a bugfix
for bosagora#1367
AndrejMitrovic added a commit to AndrejMitrovic/agora that referenced this issue Dec 2, 2020
This uses our own branch which includes a bugfix
for bosagora#1367
AndrejMitrovic added a commit to AndrejMitrovic/agora that referenced this issue Dec 2, 2020
This uses our own branch which includes a bugfix
for bosagora#1367
@AndrejMitrovic AndrejMitrovic removed the status-blocked Another issue/PR/external dependency is blocking completion of this PR/issue label Dec 2, 2020
AndrejMitrovic added a commit that referenced this issue Dec 2, 2020
This uses our own branch which includes a bugfix
for #1367
@AndrejMitrovic
Copy link
Contributor

Resolved on our side with #1397

I'm renaming this issue into a tracking issue for when eventcore is fixed.

@AndrejMitrovic AndrejMitrovic changed the title Agora emits Invalid Memory Operation exception and crashes [upstream] Eventcore bug with collection during allocation Dec 2, 2020
@kchulj kchulj added the Story-Points:13 This takes > 7 days to complete label Dec 2, 2020
@Geod24
Copy link
Collaborator

Geod24 commented Jan 26, 2021

Closing since upgrading the compiler will fix this.

@Geod24 Geod24 closed this as completed Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C.Tools An issue related to tooling / external programs Story-Points:13 This takes > 7 days to complete type-bug Things don't work as they were intended to, or the way they were intended to work doesn't make sense
Projects
None yet
Development

No branches or pull requests

5 participants