pyaccumulo

A python client library for Apache Accumulo

Licensed under the Apache 2.0 License

This is still a work in progress. Pull requests are welcome.

Requirements

A running Accumulo cluster
The new Accumulo Thrift Proxy (https://issues.apache.org/jira/browse/ACCUMULO-482) running. See https://github.com/accumulo/pyaccumulo/wiki/pyaccumulo-Tutorial for setup details.
Thrift python lib installed

Installation

pip install thrift
git clone [email protected]:accumulo/pyaccumulo.git

Basic Usage

Creating a connection

from pyaccumulo import Accumulo, Mutation, Range
conn = Accumulo(host="my.proxy.hostname", port=50096, user="root", password="secret")

Basic Table Operations

table = "mytable"
if not conn.table_exists(table):
    conn.create_table(table)

Writing Mutations with a BatchWriter (Batched and optimized for throughput)

wr = conn.create_batch_writer(table)
for num in range(0, 1000):
    m = Mutation("row_%d"%num)
    m.put(cf="cf1", cq="cq1", val="%d"%num)
    m.put(cf="cf2", cq="cq2", val="%d"%num)
    wr.add_mutation(m)
wr.close()

Simple writes (immediate and syncronous)

for num in range(0, 1000):
    m = Mutation("row_%d"%num)
    m.put(cf="cf1", cq="cq1", val="%d"%num)
    m.put(cf="cf2", cq="cq2", val="%d"%num)
    conn.write(table, m)

Scanning a Table

# scan the entire table
for entry in conn.scan(table):
    print entry.row, entry.cf, entry.cq, entry.cv, entry.ts, entry.val

# scan() and batch_scan() return a named tuple of (row, cf, cq, cv, ts, val)

# scan only a portion of the table
for entry in conn.scan(table, scanrange=Range(srow='row_1', erow='row_2'), cols=[["cf1"]]):
    print entry.row, entry.cf, entry.cq, entry.cv, entry.ts, entry.val

Using a Batch Scanner

# scan the entire table with 10 threads
for entry in conn.batch_scan(table, numthreads=10):
    print entry.row, entry.cf, entry.cq, entry.cv, entry.ts, entry.val

Running the Examples

Run these commands once before running any of the examples.

cd pyaccumulo
vi settings.py # change these settings to match your proxy HOST/PORT and USER/PASSWORD
export PYTHONPATH="."

Example of simple ingest and scanning

python examples/simple.py

Example use of Combiners for Analytics

python examples/analytics.py

Example use Intersecting Iterator for search.

# index all the files in the pyaccumulo directory
$ python examples/intersecting_iterator/ingest.py ii_file_search *
Creating table: ii_file_search
indexing file examples/analytics.py
indexing file examples/regex_search.py
indexing file examples/simple.py
indexing file examples/indexed_doc_iterator/ingest.py
...

# Now search the "ii_file_search" table for files that contain "assert_called_with" and "assertEquals"
python examples/intersecting_iterator/search.py ii_file_search assert_called_with assertEquals
tests/core_tests.py
tests/iterator_tests.py

Example use Document Intersecting Iterator for search. This indexes the data in a slightly different way so the Iterator returns the document value as opposed to having to fetch it separately.

# index all the files in the pyaccumulo directory
$ python examples/indexed_doc_iterator/ingest.py dociter_file_search *
Creating table: dociter_file_search
indexing file examples/analytics.py
indexing file examples/regex_search.py
indexing file examples/simple.py
indexing file examples/indexed_doc_iterator/ingest.py
...

# Now search the "dociter_file_search" table for files that contain "hashlib" and "search_terms"
python examples/indexed_doc_iterator/search.py dociter_file_search hashlib search_terms
examples/indexed_doc_iterator/search.py
examples/intersecting_iterator/search.py

Example use of Regex Filter for regex based searching

python examples/regex_search.py

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
examples		examples
pyaccumulo		pyaccumulo
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
rpm-install-script.sh		rpm-install-script.sh
settings.py		settings.py
setup.py		setup.py
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyaccumulo

Requirements

Installation

Basic Usage

Creating a connection

Basic Table Operations

Writing Mutations with a BatchWriter (Batched and optimized for throughput)

Simple writes (immediate and syncronous)

Scanning a Table

Using a Batch Scanner

Running the Examples

About

Releases

Packages

Contributors 4

Languages

License

revelc/pyaccumulo

Folders and files

Latest commit

History

Repository files navigation

pyaccumulo

Requirements

Installation

Basic Usage

Creating a connection

Basic Table Operations

Writing Mutations with a BatchWriter (Batched and optimized for throughput)

Simple writes (immediate and syncronous)

Scanning a Table

Using a Batch Scanner

Running the Examples

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages