mapreduce-simple

Simple and minimal implementation of the MapReduce distributed computation framework introduced by Jeffrey Dean and Sanjay Ghemawat @ Google Inc. in 2004.

MapReduce is a programming model and processing technique designed for distributed computing on large datasets. It consists of two main phases: the Map phase, which processes input files/data in parallel, and the Reduce phase, which performs a summary operation on the mapped data by key.

Status: Completed (✅)

Objective

Minimum viable implementation that produces the same output as a sequential MapReduce application.
Graceful exit of all forked threads / goroutines.

Usage

To run distributed MapReduce (default):

make

To run sequential MapReduce (for testing/benchmarking):

make run_seq

To clean up the created directories (run as part of previous targets):

make clean

Directory Structure

./plugins: Directory where the compiled plugin (wordcounter) will be stored.
./intermediates: Directory for storing intermediate files generated during the Map phase.
./outputs: Directory for storing the final output files after the Reduce phase.

Additional References:

MIT 6.824 Spring'20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

mapreduce-simple

Status: Completed (✅)

Objective

Usage

Directory Structure

Additional References:

Files

README.md

Latest commit

History

README.md

File metadata and controls

mapreduce-simple

Status: Completed (✅)

Objective

Usage

Directory Structure

Additional References: