Is public archive or quickstart image for v20 good now for horizon ingestion?! #557

jun0tpyrc · 2024-01-30T03:22:10Z

What version are you using?

quickstart: 85a2c8b

What did you do?

use stellar:quickstart, only tuned two things

export HISTORY_RETENTION_COUNT=1500000
export PER_HOUR_RATE_LIMIT="0"

What did you expect to see?

nodes can get in sync

What did you see instead?

ingestion fails with looping - even pruning bucket/ bucket caches for horizon ->captive-core getting same results (multiple times)

time="2024-01-29T23:42:29.889Z" level=info msg="Processing ledger entry changes" pid=86 processed_entries=36500000 progress="83.95%" sequence=50141695 service=ingest source=historyArchive
time="2024-01-29T23:45:38.843Z" level=info msg="History: Catching up to ledger 50141759: Download & apply checkpoints: num checkpoints left to apply:0 (100% done)" pid=86 service=ingest s
ubservice=stellar-core
time="2024-01-29T23:45:43.082Z" level=info msg="History: Catching up to ledger 50142591: downloading ledger files 13/13 (100%)" pid=86 service=ingest subservice=stellar-core
time="2024-01-29T23:45:46.321Z" level=info msg="History: Catching up to ledger 50142591: Download & apply checkpoints: num checkpoints left to apply:13 (0% done)" pid=86 service=ingest su
bservice=stellar-core
time="2024-01-29T23:45:46.321Z" level=info msg="History: Catching up to ledger 50142591: Download & apply checkpoints: num checkpoints left to apply:13 (0% done)" pid=86 service=ingest su
bservice=stellar-core
time="2024-01-29T23:46:22.160Z" level=info msg="History: Catching up to ledger 50142591: Download & apply checkpoints: num checkpoints left to apply:12 (7% done)" pid=86 service=ingest su
bservice=stellar-core
...
(got broken at different boundary  and restart loop for processed_entries, for example)
...
time="2024-01-29T23:53:21.015Z" level=info msg="Processing ledger entry changes" pid=86 processed_entries=100000 progress="0.35%" sequence=50142591 service=ingest source=historyArchive

The text was updated successfully, but these errors were encountered:

Shaptic · 2024-01-30T21:58:52Z

Can you upload an unabridged chunk of the Horizon and Stellar Core logs, at least where the restart is occurring?

jun0tpyrc · 2024-01-31T02:02:40Z

stellar-core would be in sync with head but horizon can't ingest

  "horizon_version": "horizon-v2.28.0-(built-from-source)",
  "core_version": "v20.1.0",
  "ingest_latest_ledger": 0,
  "history_latest_ledger": 0,
  "history_latest_ledger_closed_at": "0001-01-01T00:00:00Z",
  "history_elder_ledger": 0,
  "core_latest_ledger": 50158275,
  "network_passphrase": "Public Global Stellar Network ; September 2015",
  "current_protocol_version": 19,
  "supported_protocol_version": 20,
  "core_supported_protocol_version": 20

truncated logs excluding httpapi request logs ref attachment (grep -v method=GET), lines highlighting ingestion loop restarting ...

time="2024-01-30T14:37:05.751Z" level=info msg="Processing ledger entry changes" pid=218 processed_entries=2700000 progress="11.36%" sequence=50151423
...
time="2024-01-30T14:38:41.316Z" level=info msg="Processing ledger entry changes" pid=218 processed_entries=50000 progress="0.34%" sequence=50151551

example-logs-fail-sync-loop.txt

Shaptic · 2024-01-31T03:27:15Z

Hmm... it's hard to debug because it starts off in an error state, but the logs suggest something might be up with the cache. Can you try the following? Stop Horizon, remove the bucket cache, and start it again:

supervisorctl stop horizon
rm -rf /opt/stellar/horizon/captive-core/bucket-cache/
supervisorctl start horizon

If this fixes it, it may be a bug with how caching works. To be more certain, could you provide another chunk of logs but leave more entries prior to the restart? Before trying the above, ideally.

jun0tpyrc · 2024-02-01T00:33:59Z

Hmm... it's hard to debug because it starts off in an error state, but the logs suggest something might be up with the cache. Can you try the following? Stop Horizon, remove the bucket cache, and start it again:
supervisorctl stop horizon
rm -rf /opt/stellar/horizon/captive-core/bucket-cache/
supervisorctl start horizon
If this fixes it, it may be a bug with how caching works. To be more certain, could you provide another chunk of logs but leave more entries prior to the restart? Before trying the above, ideally.

We have tried multiple time , not only pruning /horizon/captive-core/bucket-cache/ but also the whole

bucket-cache  captive-core/buckets  stellar.db  stellar.db-shm  stellar.db-wal

folder structures of bucket/bucket-cache and horizoin db , never got it worked for few days

jun0tpyrc · 2024-02-01T00:39:06Z

fyi impact may not be only fresh sync of v20 , I got teammate also reported (v19->v20) "upgraded node is stuck in a bucket download + ledger process loop" , which might be a similar issue

sreuland · 2024-02-08T22:19:48Z

@Shaptic , this can be closed now that stellar/go#5197 merged? looks like it's headed into upcoming horizon 2.28.2

Shaptic · 2024-02-09T00:03:55Z

I think we can only close it once quickstart is released 👍 then @jun0tpyrc can reopen if the issue persists after upgrading

Shaptic · 2024-02-13T21:37:11Z

This should be closed by #565, please reopen if not!

jun0tpyrc · 2024-02-14T05:26:42Z

confirmed latest quickstart image is working for quick sync , thanks

jun0tpyrc added the bug label Jan 30, 2024

Shaptic assigned Shaptic and sreuland Jan 31, 2024

mollykarcher added the performance label Feb 5, 2024

Shaptic mentioned this issue Feb 5, 2024

services/horizon: Add cache toggle and use libary for on-disk caching stellar/go#5197

Merged

mollykarcher added the objective-7 label Feb 8, 2024

Shaptic closed this as completed Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is public archive or quickstart image for v20 good now for horizon ingestion?! #557

Is public archive or quickstart image for v20 good now for horizon ingestion?! #557

jun0tpyrc commented Jan 30, 2024

Shaptic commented Jan 30, 2024

jun0tpyrc commented Jan 31, 2024 •

edited

Loading

Shaptic commented Jan 31, 2024 •

edited

Loading

jun0tpyrc commented Feb 1, 2024

jun0tpyrc commented Feb 1, 2024 •

edited

Loading

sreuland commented Feb 8, 2024

Shaptic commented Feb 9, 2024

Shaptic commented Feb 13, 2024

jun0tpyrc commented Feb 14, 2024

Is public archive or quickstart image for v20 good now for horizon ingestion?! #557

Is public archive or quickstart image for v20 good now for horizon ingestion?! #557

Comments

jun0tpyrc commented Jan 30, 2024

What version are you using?

What did you do?

What did you expect to see?

What did you see instead?

Shaptic commented Jan 30, 2024

jun0tpyrc commented Jan 31, 2024 • edited Loading

Shaptic commented Jan 31, 2024 • edited Loading

jun0tpyrc commented Feb 1, 2024

jun0tpyrc commented Feb 1, 2024 • edited Loading

sreuland commented Feb 8, 2024

Shaptic commented Feb 9, 2024

Shaptic commented Feb 13, 2024

jun0tpyrc commented Feb 14, 2024

jun0tpyrc commented Jan 31, 2024 •

edited

Loading

Shaptic commented Jan 31, 2024 •

edited

Loading

jun0tpyrc commented Feb 1, 2024 •

edited

Loading