Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response unpaused but never calls outgoing block hook #175

Closed
dirkmc opened this issue Jun 21, 2021 · 0 comments · Fixed by #178
Closed

Response unpaused but never calls outgoing block hook #175

dirkmc opened this issue Jun 21, 2021 · 0 comments · Fixed by #178
Labels
need/triage Needs initial labeling and prioritization

Comments

@dirkmc
Copy link
Collaborator

dirkmc commented Jun 21, 2021

It seems that under heavy load, a Graphsync response to a request for data will sometimes get stuck in a paused state.

When I run the new lotus concurrent deals test TestDealCyclesConcurrent in the itest kit with concurrency of 32, the test consistently hangs.
It appears that all the retrieval deals complete except for one or two, and the logs for these incomplete deals indicate that the problem is in Graphsync's pausing mechanism.
The branches I used to produce this state are:

  • lotus exp/rebase-itests-refactor-kit
    Includes the new TestDealCyclesConcurrent test
  • go-fil-markets dirkmc/log-validate-pull
    Adds a log line that connects the deal ID to the channel ID so we can match logs at the go-fil-markets level to the go-data-transfer level
  • go-data-transfer dirkmc/log-outgoing-block
    Adds some logging in the Graphsync transport so we can see if the sent / outgoing block hooks are called

I run the concurrent deals test in lotus:

go test ./itests -run TestDealCyclesConcurrent -v

I grep the output for each deal ID until I find the deal that hangs. Then I look for the log line that connects the deal ID to the data transfer channel ID:

2021-06-21T15:03:29.259+0200	INFO	markets-rtvl-reval	requestvalidation/requestvalidation.go:78	validate pull for deal 1624280585293275020 for data transfer channel 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052

I use that to grep the logs for both the deal ID and the channel ID so as to get all relevant logs:

grep "1624280585293275020\|1624280585292810052" /tmp/out.txt
 
2021-06-21T15:03:29.128+0200	INFO	markets	loggers/loggers.go:25	retrieval client event	{"name": "ClientEventOpen", "deal ID": "1624280585293275020", "state": "DealStatusNew", "message": ""}
2021-06-21T15:03:29.259+0200	INFO	dt-impl	impl/events.go:19	channel 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: opened
                                                                                                                                                             ?V)????u,} to channel ID 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052
2021-06-21T15:03:29.259+0200	INFO	markets	loggers/loggers.go:25	retrieval client event	{"name": "ClientEventDealProposed", "deal ID": "1624280585293275020", "state": "DealStatusWaitForAcceptance", "message": ""}
2021-06-21T15:03:29.259+0200	INFO	dt_graphsync	graphsync/graphsync.go:476	12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052, pull request queued, req=&{BCid:bafk2bzacedr7tnpc62gvo2wcaxswfab4yduzmkv44kodcxbnpxrhaqhdbidk6 Type:0 Paus:false Part:false Pull:true Stor:0xc01035b758 Vouch:0xc01035b788 VTyp:RetrievalDealProposal/1 XferID:1624280585292810052 RestartChannel:--0}
2021-06-21T15:03:29.259+0200	DEBUG	dt_graphsync	graphsync/graphsync.go:520	12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: received request for data (pull)
2021-06-21T15:03:29.259+0200	INFO	dt-impl	impl/events.go:300	channel 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: received new channel request from 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h
2021-06-21T15:03:29.259+0200	INFO	markets-rtvl-reval	requestvalidation/requestvalidation.go:78	validate pull for deal 1624280585293275020 for data transfer channel 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052
2021-06-21T15:03:29.390+0200	INFO	markets	loggers/loggers.go:30	retrieval provider event	{"name": "ProviderEventOpen", "deal ID": "1624280585293275020", "receiver": "12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h", "state": "DealStatusNew", "message": ""}
2021-06-21T15:03:29.668+0200	INFO	markets	loggers/loggers.go:30	retrieval provider event	{"name": "ProviderEventDealAccepted", "deal ID": "1624280585293275020", "receiver": "12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h", "state": "DealStatusUnsealing", "message": ""}
2021-06-21T15:03:29.736+0200	DEBUG	dt_graphsync	graphsync/graphsync.go:570	12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: pausing graphsync response
2021-06-21T15:03:29.736+0200	DEBUG	dt_graphsync	graphsync/graphsync.go:969	12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: received request for data
                                                                                                                                                             ?V)????u,} to channel ID 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052
2021-06-21T15:03:29.741+0200	INFO	markets	loggers/loggers.go:30	retrieval provider event	{"name": "ProviderEventUnsealComplete", "deal ID": "1624280585293275020", "receiver": "12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h", "state": "DealStatusUnsealed", "message": ""}
2021-06-21T15:03:29.805+0200	INFO	dt-impl	impl/impl.go:391	resume channel 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052
2021-06-21T15:03:29.805+0200	DEBUG	dt_graphsync	graphsync/graphsync.go:1055	12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: unpausing response
2021-06-21T15:03:29.805+0200	WARN	dt-impl	impl/impl.go:400	Error attempting to resume 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052 at transport level: request is not paused
2021-06-21T15:03:29.998+0200	INFO	dt-impl	impl/events.go:188	channel 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: received new response, accepting channel
2021-06-21T15:03:29.999+0200	INFO	markets	loggers/loggers.go:25	retrieval client event	{"name": "ClientEventDealAccepted", "deal ID": "1624280585293275020", "state": "DealStatusAccepted", "message": ""}
2021-06-21T15:03:31.318+0200	INFO	markets	loggers/loggers.go:25	retrieval client event	{"name": "ClientEventPaymentChannelAddingFunds", "deal ID": "1624280585293275020", "state": "DealStatusPaymentChannelAddingInitialFunds", "message": ""}
2021-06-21T15:03:32.511+0200	INFO	markets	loggers/loggers.go:25	retrieval client event	{"name": "ClientEventPaymentChannelReady", "deal ID": "1624280585293275020", "state": "DealStatusPaymentChannelAllocatingLane", "message": ""}
2021-06-21T15:03:32.725+0200	INFO	markets	loggers/loggers.go:25	retrieval client event	{"name": "ClientEventLaneAllocated", "deal ID": "1624280585293275020", "state": "DealStatusOngoing", "message": ""}

Normally after ProviderEventUnsealComplete there would be a ProviderEventBlockSent event.

This log line indicates that hookActions.PauseResponse() is called when a new request for data arrives:

2021-06-21T15:03:29.736+0200	DEBUG	dt_graphsync	graphsync/graphsync.go:570	12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: pausing graphsync response

These log lines indicate that Graphsync.UnpauseResponse is called but that Graphsync does not believe the response is paused:

2021-06-21T15:03:29.805+0200	DEBUG	dt_graphsync	graphsync/graphsync.go:1055	12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: unpausing response
2021-06-21T15:03:29.805+0200	WARN	dt-impl	impl/impl.go:400	Error attempting to resume 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052 at transport level: request is not paused

Therefore the Graphsync transfer remains in the paused state, the outgoing block hook is never called (see the log lines that would be output if it were called) and ProviderEventBlockSent is never fired.

@dirkmc dirkmc added the need/triage Needs initial labeling and prioritization label Jun 21, 2021
@dirkmc dirkmc changed the title Response paused then unpaused but never calls outgoing block hook Response unpaused but never calls outgoing block hook Jun 21, 2021
@raulk raulk linked a pull request Jul 19, 2021 that will close this issue
@raulk raulk removed a link to a pull request Jul 22, 2021
@raulk raulk closed this as completed Jul 22, 2021
marten-seemann pushed a commit that referenced this issue Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
2 participants