Skip to content

Minimal implementation of the MapReduce distributed computation framework in Go

Notifications You must be signed in to change notification settings

vismaysur/mapreduce-simple

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mapreduce-simple

Go CI

Simple and minimal implementation of the MapReduce distributed computation framework introduced by Jeffrey Dean and Sanjay Ghemawat @ Google Inc. in 2004.

MapReduce is a programming model and processing technique designed for distributed computing on large datasets. It consists of two main phases: the Map phase, which processes input files/data in parallel, and the Reduce phase, which performs a summary operation on the mapped data by key.

Status: Completed (✅)

Objective

  • Minimum viable implementation that produces the same output as a sequential MapReduce application.
  • Graceful exit of all forked threads / goroutines.

Usage

To run distributed MapReduce (default):

make

To run sequential MapReduce (for testing/benchmarking):

make run_seq

To clean up the created directories (run as part of previous targets):

make clean

Directory Structure

  • ./plugins: Directory where the compiled plugin (wordcounter) will be stored.
  • ./intermediates: Directory for storing intermediate files generated during the Map phase.
  • ./outputs: Directory for storing the final output files after the Reduce phase.

Additional References:

About

Minimal implementation of the MapReduce distributed computation framework in Go

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published