First steps

Once you have followed the steps in the getting_started/installation.adoc section to install the operator and its dependencies, you will now deploy an HDFS cluster and its dependencies. Afterwards you can verify that it works by creating, verifying and deleting a test file in HDFS.

Setup

Zookeeper

To deploy a Zookeeper cluster create one file called zk.yaml:

link:example$getting_started/zk.yaml[role=include]

We also need to define a ZNode that will be used by the HDFS cluster to reference Zookeeper. Create another file called znode.yaml:

link:example$getting_started/znode.yaml[role=include]

Apply both of these files:

link:example$getting_started/getting_started.sh[role=include]

The state of the Zookeeper cluster can be tracked with kubectl:

link:example$getting_started/getting_started.sh[role=include]

HDFS

An HDFS cluster has three components: the namenode, the datanode and the journalnode. Create a file named hdfs.yaml defining 2 namenodes and one datanode and journalnode each:

link:example$getting_started/hdfs.yaml[role=include]

Where:

metadata.name contains the name of the HDFS cluster
the label of the Docker image provided by Stackable must be set in spec.version

Note

Please note that the version you need to specify for spec.version is not only the version of Hadoop which you want to roll out, but has to be amended with a Stackable version as shown. This Stackable version is the version of the underlying container image which is used to execute the processes. For a list of available versions please check our image registry. It should generally be safe to simply use the latest image version that is available.

Create the actual HDFS cluster by applying the file:

link:example$getting_started/getting_started.sh[role=include]

Track the progress with kubectl as this step may take a few minutes:

link:example$getting_started/getting_started.sh[role=include]

Verify that it works

To test the cluster you can create a new file, check its status and then delete it. We will execute these actions from within a helper pod. Create a file called webhdfs.yaml:

link:example$getting_started/webhdfs.yaml[role=include]

Apply it and monitor its progress:

link:example$getting_started/getting_started.sh[role=include]
link:example$getting_started/getting_started.sh[role=include]

To begin with the cluster should be empty: this can be verified by listing all resources at the root directory (which should return an empty array!):

link:example$getting_started/getting_started.sh[role=include]

Creating a file in HDFS using the Webhdfs API requires a two-step PUT (the reason for having a two-step create/append is to prevent clients from sending out data before the redirect). First, create a file with some text in it called testdata.txt and copy it to the tmp directory on the helper pod:

link:example$getting_started/getting_started.sh[role=include]

Then use curl to issue a PUT command:

link:example$getting_started/getting_started.sh[role=include]

This will return a location that will look something like this:

http://simple-hdfs-datanode-default-0.simple-hdfs-datanode-default.default.svc.cluster.local:9864/webhdfs/v1/testdata.txt?op=CREATE&user.name=stackable&namenoderpcaddress=simple-hdfs&createflag=&createparent=true&overwrite=false

You can assign this to a local variable - e.g. $location - or you can copy-and-paste it into the URL, and then issue a second PUT like this:

link:example$getting_started/getting_started.sh[role=include]

Rechecking the status again with:

link:example$getting_started/getting_started.sh[role=include]

will now display some metadata about the file that was created in the HDFS cluster:

{
    "FileStatuses": {
        "FileStatus": [
            {
                "accessTime": 1660821734999,
                "blockSize": 134217728,
                "childrenNum": 0,
                "fileId": 16396,
                "group": "supergroup",
                "length": 597,
                "modificationTime": 1660821735602,
                "owner": "stackable",
                "pathSuffix": "testdata.txt",
                "permission": "644",
                "replication": 3,
                "storagePolicy": 0,
                "type": "FILE"
            }
        ]
    }
}

To clean up, the file can be deleted like this:

link:example$getting_started/getting_started.sh[role=include]

What’s next

Look at the usage-guide/index.adoc to find out more about configuring your HDFS cluster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

first_steps.adoc

first_steps.adoc

First steps

Setup

Zookeeper

HDFS

Verify that it works

What’s next

Files

first_steps.adoc

Latest commit

History

first_steps.adoc

File metadata and controls

First steps

Setup

Zookeeper

HDFS

Verify that it works

What’s next