Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After each upgrade, there are always one or two heights where the CPU is full and the task cannot run #1191

Closed
LINJINTIANDE opened this issue Apr 29, 2023 · 2 comments
Assignees
Labels
kind/bug Kind: Bug

Comments

@LINJINTIANDE
Copy link

Describe the bug:

After each upgrade, there is always one or two heights where the cpu is full and the task cannot run, for example, the height is 2809803

Steps to Reproduce:

lily job run --storage=lily --tasks=block_header,block_parent,drand_block_entrie,id_addresses,actor,actor_state,internal_messages,internal_parsed_messages,message,block_message,receipt,message_gas_economy,derived_gas_outputs,parsed_message,multisig_approvals,multisig_transaction,miner_current_deadline_info,miner_fee_debt,miner_info,miner_locked_fund,miner_pre_commit_info,miner_sector_deal,miner_sector_event,miner_sector_infos_v7 walk --from=2809803 --to=2809803
image

Lily Version: 1.15.1

machine configuration

CPU Authentic AMD7302
Memory 1t
GPU2080
4t ssd

@Terryhung
Copy link
Collaborator

Terryhung commented May 4, 2023

Hi @LINJINTIANDE,

Based on our analysis, we have identified that the root cause of the issue is the large number of actor changes that need to be handled during this epoch. Specifically, as part of the lotus migration, a total of 2,146,633 actors were modified, leading to a high demand for CPU resources for tasks related to actor changes.

We recommend considering the option to skip this epoch given the unusually high number of actor changes. Then start from 2809804.

Typically, in a normal epoch, the number of actor changes does not exceed 1000. To address this issue, we propose adding a threshold for tasks in the watch job. Specifically, if the number of actor changes exceeds 10,000, we should skip the task to prevent it from consuming the entire CPU and allow it to fail gracefully. Additionally, we plan to support the adjustment of this threshold within the walk job.

Thanks!

@LINJINTIANDE
Copy link
Author

Thanks for your answer. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Kind: Bug
Projects
None yet
Development

No branches or pull requests

2 participants