Python microlibrary for map-reduce testing/prototyping

Map-reduce approach is used in distributed computing, however, the deployment of real map-reduce tools like Hadoop is too complicated for those who just want to practice solving simple tasks with this approach.

This library makes use of Python built-in map(), functools.reduce() and Python generators to implement map-reduce pipeline. It includes additional simple feature which is suited for learning and debugging - printing execution step-by-step. It is not intended for production usage.

Installation

You can copy single file mapreduce/mapreduce.py into your project, there are no dependencies.

Or, alternatively:

pip install -e git+https://github.com/File5/simple-mapreduce#egg=simple-mapreduce

then, to uninstall:

pip uninstall simple-mapreduce

Usage

Example task which finds the number with the largest number of repetitions

from mapreduce import MapReduceTask

# actually, (verbose=True, lazy=False) are default parameters
t = MapReduceTask(verbose=True, lazy=False)

# the order matters
@t.map
def m1(k, v):
    yield v, 1

@t.reduce
def r1(k, v):
    yield k, sum(v)

@t.map
def m2(k, v):
    yield 'all', (k, v)

@t.reduce
def r2(k, v):
    km, vm = None, None
    for ki, vi in v:
        if vm is None or vi > vm:
            km, vm = ki, vi
    yield 'max', (km, vm)

x = [1,2,3,1,2,1,4,5,6]
print(list(t(x)))

The output is the following

m1: (0, 1) -> (1, 1)
m1: (1, 2) -> (2, 1)
m1: (2, 3) -> (3, 1)
m1: (3, 1) -> (1, 1)
m1: (4, 2) -> (2, 1)
m1: (5, 1) -> (1, 1)
m1: (6, 4) -> (4, 1)
m1: (7, 5) -> (5, 1)
m1: (8, 6) -> (6, 1)
r1: (1, [1, 1, 1]) -> (1, 3)
r1: (2, [1, 1]) -> (2, 2)
r1: (3, [1]) -> (3, 1)
r1: (4, [1]) -> (4, 1)
r1: (5, [1]) -> (5, 1)
r1: (6, [1]) -> (6, 1)
m2: (1, 3) -> ('all', (1, 3))
m2: (2, 2) -> ('all', (2, 2))
m2: (3, 1) -> ('all', (3, 1))
m2: (4, 1) -> ('all', (4, 1))
m2: (5, 1) -> ('all', (5, 1))
m2: (6, 1) -> ('all', (6, 1))
r2: ('all', [(1, 3), (2, 2), (3, 1), (4, 1), (5, 1), (6, 1)]) -> ('max', (1, 3))
[('max', (1, 3))]

Word count task

t = MapReduceTask(verbose=True, lazy=False)

@t.map
def m1(k, v):
    for word in v.split(' '):
        yield word, 1

@t.reduce
def r1(k, v):
    yield k, sum(v)

x = ["hello world word world of words"]
print(list(t(x)))

The output is the following

m1: (0, 'hello world word world of words') -> ('hello', 1)
m1: (0, 'hello world word world of words') -> ('world', 1)
m1: (0, 'hello world word world of words') -> ('word', 1)
m1: (0, 'hello world word world of words') -> ('world', 1)
m1: (0, 'hello world word world of words') -> ('of', 1)
m1: (0, 'hello world word world of words') -> ('words', 1)
r1: ('hello', [1]) -> ('hello', 1)
r1: ('world', [1, 1]) -> ('world', 2)
r1: ('word', [1]) -> ('word', 1)
r1: ('of', [1]) -> ('of', 1)
r1: ('words', [1]) -> ('words', 1)
[('hello', 1), ('world', 2), ('word', 1), ('of', 1), ('words', 1)]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
mapreduce		mapreduce
tests		tests
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python microlibrary for map-reduce testing/prototyping

Installation

Usage

About

Releases

Packages

Languages

License

File5/simple-mapreduce

Folders and files

Latest commit

History

Repository files navigation

Python microlibrary for map-reduce testing/prototyping

Installation

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages