Run all

docker-compose build
docker-compose up

HDFS

$ hdfscli upload --alias=<...> -f /local/file.txt file.txt

$ hdfscli --alias=docker

Welcome to the interactive HDFS python shell.
The HDFS client is available as `CLIENT`.

>>> CLIENT
<InsecureClient(url='http://namenode:50070')>
>>> CLIENT.list("/")
['user']
>>> CLIENT.list("/user")
['root']
>>> CLIENT.list("/user/root")
['file.txt']

Submit

PYSPARK_PYTHON=python3 /spark/bin/spark-submit --master spark://spark-master:17077 /app/main.py /spark/examples/src/main/resources/kv1.txt

Livy

Start the job:

$ curl -X POST --data '{"file": "hdfs://namenode:8020/user/root/main.py"}' -H "Content-Type: application/json" localhost:8998/batches
{"id":0,"state":"running","log":[]}

We can check the status:

$ curl localhost:8998/batches/0
{"id":0,"state":"success","log":[]}

And the output by adding the /log suffix

$ curl localhost:8998/batches/0/log

Reading data from database

/spark/bin/spark-shell --packages org.postgresql:postgresql:42.1.1

/spark/bin/spark-submit \
    --driver-class-path '/root/.ivy2/jars/org.postgresql_postgresql-42.1.1.jar' \
    --jars '/root/.ivy2/jars/org.postgresql_postgresql-42.1.1.jar' \
    --conf 'spark.master=spark://spark-master:17077' 'hdfs://namenode:8020/app/read_df_from_db.py'

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
base		base
examples		examples
livy		livy
master		master
submit		submit
worker		worker
.gitignore		.gitignore
README.md		README.md
build.sh		build.sh
docker-compose.yml		docker-compose.yml
hadoop.env		hadoop.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Run all

HDFS

Submit

Livy

Reading data from database

About

Releases

Packages

Languages

taland/apache-spark

Folders and files

Latest commit

History

Repository files navigation

Run all

HDFS

Submit

Livy

Reading data from database

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages