-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12507][Streaming][Document]Expose closeFileAfterWrite and allowBatching configurations for Streaming #10453
Conversation
Test build #48251 has finished for PR 10453 at commit
|
Let's improve the title of items like this. "Update x" is never descriptive |
<td><code>spark.streaming.driver.writeAheadLog.closeFileAfterWrite</code></td> | ||
<td>false</td> | ||
<td> | ||
Whether to close the file after writing a write ahead log record in driver. Because S3 doesn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say on the driver
instead of in driver
.
I have a few comments on phrasing but otherwise it lgtm |
</tr> | ||
<tr> | ||
<td><code>spark.streaming.driver.writeAheadLog.allowBatching</code></td> | ||
<td>false</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for me: the default value is true
.
That's why I want to expose this one since the behavior is different from 1.5.0.
@BenFradet Addressed. Thanks for your reviewing. |
Test build #48436 has finished for PR 10453 at commit
|
LGTM |
Maybe we can also include that |
|
Test build #48516 has finished for PR 10453 at commit
|
@zsxwing Thanks! LGTM |
<td><code>spark.streaming.receiver.writeAheadLog.closeFileAfterWrite</code></td> | ||
<td>false</td> | ||
<td> | ||
Whether to close the file after writing a write ahead log record on the receivers. Because S3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Because S3 .... " --> Set this to 'true' when you want to use S3 (or any file system that does not support flushing) for the metadata WAL at the driver.
Test build #48971 has finished for PR 10453 at commit
|
@@ -1985,7 +1985,11 @@ To run a Spark Streaming applications, you need to have the following. | |||
to increase aggregate throughput. Additionally, it is recommended that the replication of the | |||
received data within Spark be disabled when the write ahead log is enabled as the log is already | |||
stored in a replicated storage system. This can be done by setting the storage level for the | |||
input stream to `StorageLevel.MEMORY_AND_DISK_SER`. | |||
input stream to `StorageLevel.MEMORY_AND_DISK_SER`. While using S3 (or any file system that | |||
does not support flushing) for Write Ahead Logs, please remember to enable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Write Ahead Logs is not in caps in this text. so please be consistent.
just one more comment. then LGTM. |
Test build #48980 has finished for PR 10453 at commit
|
LGTM. Merging this to master and 1.6. Thanks! |
…owBatching configurations for Streaming /cc tdas brkyvz Author: Shixiong Zhu <[email protected]> Closes #10453 from zsxwing/streaming-conf. (cherry picked from commit c94199e) Signed-off-by: Tathagata Das <[email protected]>
/cc @tdas @brkyvz