Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is public archive or quickstart image for v20 good now for horizon ingestion?! #557

Closed
jun0tpyrc opened this issue Jan 30, 2024 · 9 comments

Comments

@jun0tpyrc
Copy link

What version are you using?

quickstart: 85a2c8b

What did you do?

use stellar:quickstart, only tuned two things

export HISTORY_RETENTION_COUNT=1500000
export PER_HOUR_RATE_LIMIT="0"

What did you expect to see?

nodes can get in sync

What did you see instead?

  • ingestion fails with looping - even pruning bucket/ bucket caches for horizon ->captive-core getting same results (multiple times)
time="2024-01-29T23:42:29.889Z" level=info msg="Processing ledger entry changes" pid=86 processed_entries=36500000 progress="83.95%" sequence=50141695 service=ingest source=historyArchive
time="2024-01-29T23:45:38.843Z" level=info msg="History: Catching up to ledger 50141759: Download & apply checkpoints: num checkpoints left to apply:0 (100% done)" pid=86 service=ingest s
ubservice=stellar-core
time="2024-01-29T23:45:43.082Z" level=info msg="History: Catching up to ledger 50142591: downloading ledger files 13/13 (100%)" pid=86 service=ingest subservice=stellar-core
time="2024-01-29T23:45:46.321Z" level=info msg="History: Catching up to ledger 50142591: Download & apply checkpoints: num checkpoints left to apply:13 (0% done)" pid=86 service=ingest su
bservice=stellar-core
time="2024-01-29T23:45:46.321Z" level=info msg="History: Catching up to ledger 50142591: Download & apply checkpoints: num checkpoints left to apply:13 (0% done)" pid=86 service=ingest su
bservice=stellar-core
time="2024-01-29T23:46:22.160Z" level=info msg="History: Catching up to ledger 50142591: Download & apply checkpoints: num checkpoints left to apply:12 (7% done)" pid=86 service=ingest su
bservice=stellar-core
...
(got broken at different boundary  and restart loop for processed_entries, for example)
...
time="2024-01-29T23:53:21.015Z" level=info msg="Processing ledger entry changes" pid=86 processed_entries=100000 progress="0.35%" sequence=50142591 service=ingest source=historyArchive
@jun0tpyrc jun0tpyrc added the bug label Jan 30, 2024
@Shaptic
Copy link
Contributor

Shaptic commented Jan 30, 2024

Can you upload an unabridged chunk of the Horizon and Stellar Core logs, at least where the restart is occurring?

@jun0tpyrc
Copy link
Author

jun0tpyrc commented Jan 31, 2024

stellar-core would be in sync with head but horizon can't ingest

  "horizon_version": "horizon-v2.28.0-(built-from-source)",
  "core_version": "v20.1.0",
  "ingest_latest_ledger": 0,
  "history_latest_ledger": 0,
  "history_latest_ledger_closed_at": "0001-01-01T00:00:00Z",
  "history_elder_ledger": 0,
  "core_latest_ledger": 50158275,
  "network_passphrase": "Public Global Stellar Network ; September 2015",
  "current_protocol_version": 19,
  "supported_protocol_version": 20,
  "core_supported_protocol_version": 20

truncated logs excluding httpapi request logs ref attachment (grep -v method=GET), lines highlighting ingestion loop restarting ...

time="2024-01-30T14:37:05.751Z" level=info msg="Processing ledger entry changes" pid=218 processed_entries=2700000 progress="11.36%" sequence=50151423
...
time="2024-01-30T14:38:41.316Z" level=info msg="Processing ledger entry changes" pid=218 processed_entries=50000 progress="0.34%" sequence=50151551

example-logs-fail-sync-loop.txt

@Shaptic
Copy link
Contributor

Shaptic commented Jan 31, 2024

Hmm... it's hard to debug because it starts off in an error state, but the logs suggest something might be up with the cache. Can you try the following? Stop Horizon, remove the bucket cache, and start it again:

supervisorctl stop horizon
rm -rf /opt/stellar/horizon/captive-core/bucket-cache/
supervisorctl start horizon

If this fixes it, it may be a bug with how caching works. To be more certain, could you provide another chunk of logs but leave more entries prior to the restart? Before trying the above, ideally.

@jun0tpyrc
Copy link
Author

Hmm... it's hard to debug because it starts off in an error state, but the logs suggest something might be up with the cache. Can you try the following? Stop Horizon, remove the bucket cache, and start it again:

supervisorctl stop horizon
rm -rf /opt/stellar/horizon/captive-core/bucket-cache/
supervisorctl start horizon

If this fixes it, it may be a bug with how caching works. To be more certain, could you provide another chunk of logs but leave more entries prior to the restart? Before trying the above, ideally.

We have tried multiple time , not only pruning /horizon/captive-core/bucket-cache/ but also the whole

bucket-cache  captive-core/buckets  stellar.db  stellar.db-shm  stellar.db-wal

folder structures of bucket/bucket-cache and horizoin db , never got it worked for few days

@jun0tpyrc
Copy link
Author

jun0tpyrc commented Feb 1, 2024

fyi impact may not be only fresh sync of v20 , I got teammate also reported (v19->v20) "upgraded node is stuck in a bucket download + ledger process loop" , which might be a similar issue

@sreuland
Copy link
Contributor

sreuland commented Feb 8, 2024

@Shaptic , this can be closed now that stellar/go#5197 merged? looks like it's headed into upcoming horizon 2.28.2

@Shaptic
Copy link
Contributor

Shaptic commented Feb 9, 2024

I think we can only close it once quickstart is released 👍 then @jun0tpyrc can reopen if the issue persists after upgrading

@Shaptic
Copy link
Contributor

Shaptic commented Feb 13, 2024

This should be closed by #565, please reopen if not!

@Shaptic Shaptic closed this as completed Feb 13, 2024
@jun0tpyrc
Copy link
Author

confirmed latest quickstart image is working for quick sync , thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

4 participants