Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12507][Streaming][Document]Expose closeFileAfterWrite and allowBatching configurations for Streaming #10453

Closed
wants to merge 5 commits into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Dec 23, 2015

@SparkQA
Copy link

SparkQA commented Dec 23, 2015

Test build #48251 has finished for PR 10453 at commit 4295137.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Dec 29, 2015

Let's improve the title of items like this. "Update x" is never descriptive

@zsxwing zsxwing changed the title [SPARK-12507][Streaming][Document]Update Streaming configurations for 1.6 [SPARK-12507][Streaming][Document]Expose closeFileAfterWrite and allowBatching configurations for Streaming Dec 29, 2015
<td><code>spark.streaming.driver.writeAheadLog.closeFileAfterWrite</code></td>
<td>false</td>
<td>
Whether to close the file after writing a write ahead log record in driver. Because S3 doesn't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say on the driver instead of in driver.

@BenFradet
Copy link
Contributor

I have a few comments on phrasing but otherwise it lgtm

</tr>
<tr>
<td><code>spark.streaming.driver.writeAheadLog.allowBatching</code></td>
<td>false</td>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me: the default value is true.

That's why I want to expose this one since the behavior is different from 1.5.0.

@zsxwing
Copy link
Member Author

zsxwing commented Dec 29, 2015

@BenFradet Addressed. Thanks for your reviewing.

@SparkQA
Copy link

SparkQA commented Dec 30, 2015

Test build #48436 has finished for PR 10453 at commit bce7a29.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor

brkyvz commented Dec 30, 2015

LGTM

@brkyvz
Copy link
Contributor

brkyvz commented Dec 30, 2015

Maybe we can also include that allowBatching is not just helpful when closeFileAfterWrite is enabled, but is also very helpful to scale to a large number of receivers 50+ for example.

@zsxwing
Copy link
Member Author

zsxwing commented Dec 30, 2015

Maybe we can also include that allowBatching is not just helpful when closeFileAfterWrite is enabled, but is also very helpful to scale to a large number of receivers 50+ for example.

I guess enabled should be disabled? Oops, ... misunderstood...

@SparkQA
Copy link

SparkQA commented Dec 30, 2015

Test build #48516 has finished for PR 10453 at commit 7d9b038.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor

brkyvz commented Dec 31, 2015

@zsxwing Thanks! LGTM

<td><code>spark.streaming.receiver.writeAheadLog.closeFileAfterWrite</code></td>
<td>false</td>
<td>
Whether to close the file after writing a write ahead log record on the receivers. Because S3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Because S3 .... " --> Set this to 'true' when you want to use S3 (or any file system that does not support flushing) for the metadata WAL at the driver.

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48971 has finished for PR 10453 at commit 4d55b03.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1985,7 +1985,11 @@ To run a Spark Streaming applications, you need to have the following.
to increase aggregate throughput. Additionally, it is recommended that the replication of the
received data within Spark be disabled when the write ahead log is enabled as the log is already
stored in a replicated storage system. This can be done by setting the storage level for the
input stream to `StorageLevel.MEMORY_AND_DISK_SER`.
input stream to `StorageLevel.MEMORY_AND_DISK_SER`. While using S3 (or any file system that
does not support flushing) for Write Ahead Logs, please remember to enable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Write Ahead Logs is not in caps in this text. so please be consistent.

@tdas
Copy link
Contributor

tdas commented Jan 7, 2016

just one more comment. then LGTM.

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48980 has finished for PR 10453 at commit 28a750d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor

tdas commented Jan 8, 2016

LGTM. Merging this to master and 1.6. Thanks!

asfgit pushed a commit that referenced this pull request Jan 8, 2016
…owBatching configurations for Streaming

/cc tdas brkyvz

Author: Shixiong Zhu <[email protected]>

Closes #10453 from zsxwing/streaming-conf.

(cherry picked from commit c94199e)
Signed-off-by: Tathagata Das <[email protected]>
@asfgit asfgit closed this in c94199e Jan 8, 2016
@zsxwing zsxwing deleted the streaming-conf branch January 8, 2016 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants