Skip to content

Commit

Permalink
Mentioned performance problem with WAL
Browse files Browse the repository at this point in the history
  • Loading branch information
tdas committed Dec 11, 2014
1 parent 7787209 commit f746951
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 10 deletions.
9 changes: 5 additions & 4 deletions docs/streaming-kafka-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,8 @@ data loss on failures. This receiver is automatically used when the write ahead
may reduce the receiving throughput of individual Kafka receivers compared to the unreliable
receivers, but this can be corrected by running
[more receivers in parallel](streaming-programming-guide.html#level-of-parallelism-in-data-receiving)
to increase aggregate throughput. Also it is strongly recommended that the replication in the
storage level be disabled when the write ahead log is enabled because the log is already stored
in a replicated storage system. This is done using `KafkaUtils.createStream(...,
StorageLevel.MEMORY_AND_DISK_SER)`.
to increase aggregate throughput. Additionally, it is recommended that the replication of the
received data within Spark be disabled when the write ahead log is enabled as the log is already stored
in a replicated storage system. This can be done by setting the storage level for the input
stream to `StorageLevel.MEMORY_AND_DISK_SER` (that is, use
`KafkaUtils.createStream(..., StorageLevel.MEMORY_AND_DISK_SER)`).
19 changes: 13 additions & 6 deletions docs/streaming-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -1568,13 +1568,20 @@ To run a Spark Streaming applications, you need to have the following.
with Mesos.


- *Configuring write ahead logs (Spark 1.2+)* - Starting for Spark 1.2, we have introduced a new
feature of write ahead logs. If enabled, all the data received from a receiver gets written into
- *[Experimental in Spark 1.2] Configuring write ahead logs* - In Spark 1.2,
we have introduced a new experimental feature of write ahead logs for achieved strong
fault-tolerance guarantees. If enabled, all the data received from a receiver gets written into
a write ahead log in the configuration checkpoint directory. This prevents data loss on driver
recovery, thus allowing zero data loss guarantees which is discussed in detail in the
[Fault-tolerance Semantics](#fault-tolerance-semantics) section. Enable this by setting the
[configuration parameter](configuration.html#spark-streaming)
`spark.streaming.receiver.writeAheadLogs.enable` to `true`.
recovery, thus ensuring zero data loss (discussed in detail in the
[Fault-tolerance Semantics](#fault-tolerance-semantics) section). This can be enabled by setting
the [configuration parameter](configuration.html#spark-streaming)
`spark.streaming.receiver.writeAheadLogs.enable` to `true`. However, this stronger semantics may
come at the cost of the receiving throughput of individual receivers. can be corrected by running
[more receivers in parallel](#level-of-parallelism-in-data-receiving)
to increase aggregate throughput. Additionally, it is recommended that the replication of the
received data within Spark be disabled when the write ahead log is enabled as the log is already
stored in a replicated storage system. This can be done by setting the storage level for the
input stream to `StorageLevel.MEMORY_AND_DISK_SER`.

### Upgrading Application Code
{:.no_toc}
Expand Down

0 comments on commit f746951

Please sign in to comment.