Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4802] [streaming] Remove receiverInfo once receiver is de-registered #3647

Closed

Conversation

ilayaperumalg
Copy link
Contributor

Once the streaming receiver is de-registered at executor, the ReceiverTrackerActor needs to
remove the corresponding reveiverInfo from the receiverInfo map at ReceiverTracker.

  Once the streaming receiver is de-registered at executor, the `ReceiverTrackerActor` needs to
remove the corresponding reveiverInfo from the `receiverInfo` map at `ReceiverTracker`.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@ilayaperumalg ilayaperumalg changed the title [SPARK-4802] Remove receiverInfo once receiver is de-registered [SPARK-4802] [streaming] Remove receiverInfo once receiver is de-registered Dec 9, 2014
@JoshRosen
Copy link
Contributor

This is a good catch, since it looks like the original intent was that we'd remove receiverInfo entries since there's a receiverInfo.isEmpty check.

In fact, a much older version of this code actually had the remove call:

That file got renamed, but the remove call was preserved:

It looks like the bug was actually introduced in #540, possibly as a copy-paste error when copying repeated code between reportError and deregisterReceiver: cd12dd9#diff-5241413c0679331ccda4cf2063b4ea0fL141

@@ -152,6 +152,7 @@ class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false
}
receiverInfo(streamId) = newReceiverInfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a little suspicious to me that we assign to receiverInfo on this line only to remove the value that we set two lines later. I think it would be clearer to remove this line and change the next line to read

listenerBus.post(StreamingListenerReceiverStopped(newReceiverInfo))

instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference with the above approach is that, the receiverInfo (at the ReceiverTracker) is not up-to-date (if we remove this line) at least with what is being sent to the StreamingListenerBus. Does the below approach make sense?

receiverInfo -= streamId
listenerBus.post(StreamingListenerReceiverStopped(newReceiverInfo))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's essentially what I was proposing. I was suggesting

listenerBus.post(StreamingListenerReceiverStopped(newReceiverInfo))
receiverInfo.remove(streamId)

but I don't think there's going to be a huge different between this and your suggestion unless there are multi-threading concerns. Feel free to update this with your suggestion.

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@SparkQA
Copy link

SparkQA commented Dec 16, 2014

Test build #24507 has started for PR 3647 at commit 3640c86.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 16, 2014

Test build #24507 has finished for PR 3647 at commit 3640c86.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Boolean)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24507/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 16, 2014

Test build #24513 has started for PR 3647 at commit 6eb97d5.

  • This patch merges cleanly.

@ilayaperumalg
Copy link
Contributor Author

@JoshRosen updated the PR.
Though this addresses the original issue reported in SPARK-4802, I still try to figure out the issue where the streaming receivers aren't terminated after the streaming context is closed in cluster mode (SPARK-2802). Instead, the ReceiverTracker's ReceiverLauncher stop() always has nonEmpty receiverInfo.

Please see my observation and comments there: https://issues.apache.org/jira/browse/SPARK-2892. There seems to be an issue where the StopReceiver message sent by the ReceiverTrackerActor isn't received by the receiver actor (setup by ReceiverSupervisorImpl) at the executor.

I don't see this issue when running in local mode.

@SparkQA
Copy link

SparkQA commented Dec 17, 2014

Test build #24513 has finished for PR 3647 at commit 6eb97d5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Boolean)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24513/
Test PASSed.

@JoshRosen
Copy link
Contributor

This patch looks good to me, so I'm going to merge it into master, branch-1.2, branch-1.1, and branch-1.0, since it looks like #540 affects all of those branches.

@ilayaperumalg Thanks for the pointer to that other JIRA. Let's keep investigating https://issues.apache.org/jira/browse/SPARK-2892; we can continue discussion on JIRA.

@JoshRosen
Copy link
Contributor

Actually, one potential concern before I merge this: if the old never removed entries from the receiverInfo map, then it's possible that we'll get errors after this patch if we don't guard against missing entries from that map. Let me take a quick pass through the code to see whether that could be a problem...

@JoshRosen
Copy link
Contributor

Hmm, so it looks like reportError may create a new entry in the receiverInfo map if it processes an error from a previously-unknown receiver:

 /** Report error sent by a receiver */
  private def reportError(streamId: Int, message: String, error: String) {
    val newReceiverInfo = receiverInfo.get(streamId) match {
      case Some(oldInfo) =>
        oldInfo.copy(lastErrorMessage = message, lastError = error)
      case None =>
        logWarning("No prior receiver info")
        ReceiverInfo(streamId, "", null, false, "", lastErrorMessage = message, lastError = error)
    }
    receiverInfo(streamId) = newReceiverInfo
    listenerBus.post(StreamingListenerReceiverError(receiverInfo(streamId)))
    val messageWithError = if (error != null && !error.isEmpty) {
      s"$message - $error"
    } else {
      s"$message"
    }
    logWarning(s"Error reported by receiver for stream $streamId: $messageWithError")
  }

This means that we'll leak receiverInfo map entires if we ever receive errors after removing receivers. I don't think that this should ever happen, though. If we're concerned about this, I suppose we could rewrite this method to generate the new receiver info, post the new info to the listener bus, then store the info in the mapping only if there was an old entry.

@ilayaperumalg
Copy link
Contributor Author

@JoshRosen yeah, I too believe it is very unlikely. Upon deregistration of the corresponding receiver, both the blockGenerator (which is sending the reportError message) and the actor (which is responsible for sending message to the ReceiverTrackerActor are stopped by onStop at ReceiverSupervisorImpl (from the executor). Hence this should be fine IMO.

@tdas
Copy link
Contributor

tdas commented Dec 23, 2014

LGTM. Merging this.

asfgit pushed a commit that referenced this pull request Dec 23, 2014
…stered

  Once the streaming receiver is de-registered at executor, the `ReceiverTrackerActor` needs to
remove the corresponding reveiverInfo from the `receiverInfo` map at `ReceiverTracker`.

Author: Ilayaperumal Gopinathan <[email protected]>

Closes #3647 from ilayaperumalg/receiverInfo-RTracker and squashes the following commits:

6eb97d5 [Ilayaperumal Gopinathan] Polishing based on the review
3640c86 [Ilayaperumal Gopinathan] Remove receiverInfo once receiver is de-registered

(cherry picked from commit 10d69e9)
Signed-off-by: Tathagata Das <[email protected]>
@asfgit asfgit closed this in 10d69e9 Dec 23, 2014
asfgit pushed a commit that referenced this pull request Dec 23, 2014
…stered

  Once the streaming receiver is de-registered at executor, the `ReceiverTrackerActor` needs to
remove the corresponding reveiverInfo from the `receiverInfo` map at `ReceiverTracker`.

Author: Ilayaperumal Gopinathan <[email protected]>

Closes #3647 from ilayaperumalg/receiverInfo-RTracker and squashes the following commits:

6eb97d5 [Ilayaperumal Gopinathan] Polishing based on the review
3640c86 [Ilayaperumal Gopinathan] Remove receiverInfo once receiver is de-registered

(cherry picked from commit 10d69e9)
Signed-off-by: Tathagata Das <[email protected]>

Conflicts:
	streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants