After each upgrade, there are always one or two heights where the CPU is full and the task cannot run #1191

LINJINTIANDE · 2023-04-29T12:12:44Z

Describe the bug:

After each upgrade, there is always one or two heights where the cpu is full and the task cannot run, for example, the height is 2809803

Steps to Reproduce:

lily job run --storage=lily --tasks=block_header,block_parent,drand_block_entrie,id_addresses,actor,actor_state,internal_messages,internal_parsed_messages,message,block_message,receipt,message_gas_economy,derived_gas_outputs,parsed_message,multisig_approvals,multisig_transaction,miner_current_deadline_info,miner_fee_debt,miner_info,miner_locked_fund,miner_pre_commit_info,miner_sector_deal,miner_sector_event,miner_sector_infos_v7 walk --from=2809803 --to=2809803

Lily Version: 1.15.1

machine configuration

CPU Authentic AMD7302
Memory 1t
GPU2080
4t ssd

Terryhung · 2023-05-04T14:03:08Z

Hi @LINJINTIANDE,

Based on our analysis, we have identified that the root cause of the issue is the large number of actor changes that need to be handled during this epoch. Specifically, as part of the lotus migration, a total of 2,146,633 actors were modified, leading to a high demand for CPU resources for tasks related to actor changes.

We recommend considering the option to skip this epoch given the unusually high number of actor changes. Then start from 2809804.

Typically, in a normal epoch, the number of actor changes does not exceed 1000. To address this issue, we propose adding a threshold for tasks in the watch job. Specifically, if the number of actor changes exceeds 10,000, we should skip the task to prevent it from consuming the entire CPU and allow it to fail gracefully. Additionally, we plan to support the adjustment of this threshold within the walk job.

Thanks!

LINJINTIANDE · 2023-05-04T14:47:41Z

Thanks for your answer. 🙏

LINJINTIANDE added the kind/bug Kind: Bug label Apr 29, 2023

birdychang assigned Terryhung Apr 29, 2023

LINJINTIANDE closed this as completed May 4, 2023

Terryhung mentioned this issue May 8, 2023

feat: add filter for actor changes #1195

Merged

birdychang mentioned this issue May 31, 2023

When the task height is 1960322, there is an error out of memory allocating heap arena metadata #1063

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After each upgrade, there are always one or two heights where the CPU is full and the task cannot run #1191

After each upgrade, there are always one or two heights where the CPU is full and the task cannot run #1191

LINJINTIANDE commented Apr 29, 2023

Terryhung commented May 4, 2023 •

edited

Loading

LINJINTIANDE commented May 4, 2023

After each upgrade, there are always one or two heights where the CPU is full and the task cannot run #1191

After each upgrade, there are always one or two heights where the CPU is full and the task cannot run #1191

Comments

LINJINTIANDE commented Apr 29, 2023

Describe the bug:

Steps to Reproduce:

Lily Version: 1.15.1

machine configuration

Terryhung commented May 4, 2023 • edited Loading

LINJINTIANDE commented May 4, 2023

Terryhung commented May 4, 2023 •

edited

Loading