Skip to content

Latest commit

 

History

History
44 lines (27 loc) · 1.52 KB

README.md

File metadata and controls

44 lines (27 loc) · 1.52 KB

mapreduce-simple

Go CI

Simple and minimal implementation of the MapReduce distributed computation framework introduced by Jeffrey Dean and Sanjay Ghemawat @ Google Inc. in 2004.

MapReduce is a programming model and processing technique designed for distributed computing on large datasets. It consists of two main phases: the Map phase, which processes input files/data in parallel, and the Reduce phase, which performs a summary operation on the mapped data by key.

Status: Completed (✅)

Objective

  • Minimum viable implementation that produces the same output as a sequential MapReduce application.
  • Graceful exit of all forked threads / goroutines.

Usage

To run distributed MapReduce (default):

make

To run sequential MapReduce (for testing/benchmarking):

make run_seq

To clean up the created directories (run as part of previous targets):

make clean

Directory Structure

  • ./plugins: Directory where the compiled plugin (wordcounter) will be stored.
  • ./intermediates: Directory for storing intermediate files generated during the Map phase.
  • ./outputs: Directory for storing the final output files after the Reduce phase.

Additional References: