Skip to content

pyaccumulo Tutorial

jt6211 edited this page Mar 20, 2013 · 1 revision

pyaccumulo tutorial

Proxy Building and Installation

svn co https://svn.apache.org/repos/asf/accumulo/branches/1.5 accumulo-1.5-src
cd accumulo-1.5-src
mvn package -P assemble -DskipTests
tar -C ../ -xvf assemble/target/apache-accumulo-1.5.0-SNAPSHOT-dist.tar.gz
cd ../apache-accumulo-1.5.0-SNAPSHOT

Now either copy your existing accumulo configs (accumulo-site.xml and accumulo-env.sh) or manually edit accumulo-site.xml and accumulo-env.sh. See the "CONFIGURATION" section of http://accumulo.apache.org/1.4/user_manual/Administration.html#Installation for more info

Proxy Server configuration

Now edit proxy/proxy.properties. You want to make sure the following settings are changed to the following.

org.apache.accumulo.proxy.ProxyServer.useMockInstance=false
org.apache.accumulo.proxy.ProxyServer.useMiniAccumulo=false
org.apache.accumulo.proxy.ProxyServer.protocolFactory=org.apache.thrift.protocol.TCompactProtocol$Factory
org.apache.accumulo.proxy.ProxyServer.port=50096
org.apache.accumulo.proxy.ProxyServer.instancename=test
org.apache.accumulo.proxy.ProxyServer.zookeepers=localhost:2181

Make sure that the instance Name and zookeepers match your setup.

Running the proxy server

Assuming you have accumulo running already. If not, see this section of the manual for more info: http://accumulo.apache.org/1.4/user_manual/Administration.html#Running

Now, from the apache-accumulo-1.5.0-SNAPSHOT dir (from unpacking the tarball).

./bin/accumulo proxy -p proxy/proxy.properties

At this point you should be able to use the pyaccumulo lib to access this proxy server.

git clone [email protected]:accumulo/pyaccumulo.git
cd pyaccumulo
sudo pip install -r requirements.txt
export PYTHONPATH="."
vi settings.py

Make sure these settings match your setup (user/password/etc).

HOST = "localhost"
PORT = 50096
USER = 'root'
PASSWORD = 'secret'

Now run the following command. Note: this will create a table called analytics and write to it. If you have a table named analytics already, then you probably want to edit examples/analytics.py prior to running this command.

python examples/analytics.py 

You should see output like this:

Cell(row='row', cf='count', cq='cq', cv='', ts=1363777816377, val='1000')
Cell(row='row', cf='histo', cq='cq', cv='', ts=1363777816377, val='1000,2000,3000,4000,5000,6000,7000,8000,9000')
Cell(row='row', cf='max', cq='cq', cv='', ts=1363777816377, val='999')
Cell(row='row', cf='min', cq='cq', cv='', ts=1363777816377, val='0')
Cell(row='row', cf='sum', cq='cq', cv='', ts=1363777816377, val='499500')   

If you see the following then either your proxy server isn't running or your settings.py settings are wrong.

Traceback (most recent call last):
  File "examples/analytics.py", line 21, in <module>
    conn = Accumulo(host=settings.HOST, port=settings.PORT, user=settings.USER, password=settings.PASSWORD)
  File "/Users/jtrost/workspace/pyaccumulo/pyaccumulo/__init__.py", line 138, in __init__
    self.transport.open()
  File "/Library/Python/2.7/site-packages/thrift/transport/TTransport.py", line 261, in open
    return self.__trans.open()
  File "/Library/Python/2.7/site-packages/thrift/transport/TSocket.py", line 99, in open
    message=message)
thrift.transport.TTransport.TTransportException: Could not connect to localhost:50096