Skip to content

Commit

Permalink
RDS/IB: VRPC DELAY / OSS RECONNECT CAUSES 5 MINUTE STALL ON PORT FAILURE
Browse files Browse the repository at this point in the history
This problem occurs when the user gets notified of a successful
rdma write + bcopy message completion but the peer application
does not receive the bcopy message. This happens during a port down/up test.
What seems to happen is the rdma write succeeds but the bcopy message fails.
RDS should not be returning successful completion status to the user
in this case.
When RDS does a rdma followed by a bcopy message the user notification is
supposed to be implemented by method #3 below.
/* If the user asked for a completion notification on this
 * message, we can implement three different semantics:
 *  1.  Notify when we received the ACK on the RDS message
 *      that was queued with the RDMA. This provides reliable
 *      notification of RDMA status at the expense of a one-way
 *      packet delay.
 *  2.  Notify when the IB stack gives us the completion event for
 *      the RDMA operation.
 *  3.  Notify when the IB stack gives us the completion event for
 *      the accompanying RDS messages.
 * Here, we implement approach #3. To implement approach #2,
 * we would need to take an event for the rdma WR. To implement #1,
 * don't call rds_rdma_send_complete at all, and fall back to the notify
 * handling in the ACK processing code.
But unfortunately the user gets notified earlier to knowing the bcopy
send status. Right after rdma write completes the user gets notified
even though the subsequent bcopy eventually fails.
The fix is to delay signaling completions of rdma op till the
bcopy send completes.

Orabug: 22847528

Signed-off-by: Venkat Venkatsubra <[email protected]>

Acked-by: Rama Nichanamatlu <[email protected]>

Orabug: 27364391

(cherry picked from commit 804df7a)
cherry-pick-repo=linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Signed-off-by: Somasundaram Krishnasamy <[email protected]>

Orabug: 33590097

UEK6 => UEK7

(cherry picked from commit 9bca09b)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
  • Loading branch information
Venkat Venkatsubra authored and jfvogel committed Dec 3, 2021
1 parent c11665a commit 9dc52eb
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion net/rds/ib_send.c
Original file line number Diff line number Diff line change
Expand Up @@ -970,7 +970,7 @@ int rds_ib_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
send->s_queued = jiffies;
send->s_op = NULL;

if (!op->op_remote_complete)
if (!op->op_remote_complete && !op->op_notify)
nr_sig += rds_ib_set_wr_signal_state(ic, send, op->op_notify);

send->s_wr.opcode = op->op_write ? IB_WR_RDMA_WRITE : IB_WR_RDMA_READ;
Expand Down

0 comments on commit 9dc52eb

Please sign in to comment.