This Operator manages Apache NiFi clusters on Kubernetes. Apache NiFi is an open-source data integration tool that provides a web-based interface for designing, monitoring and managing data flows between various systems and devices, using a visual programming approach. It supports a wide range of data sources, formats and features such as data provenance, security and clustering.
Get started with Apache NiFi and the Stackable Operator by following the getting_started/index.adoc guide. It will guide you through the installation process and connect to the NiFi web interface. Afterwards have a look at the usage_guide/index.adoc to learn how to configure your NiFi instance to your needs or run some demos to learn more about using NiFi with other components.
The Operator manages the NifiCluster custom resource. NiFi only has a single process that it needs to run, so the NifiCluster has only a single role: node
. This role can be divided in multiple role groups.
For every role group the Operator creates a ConfigMap and StatefulSet which can have multiple replicas (Pods). Every role group is accessible through it’s own Service, and there is a Service for the whole Cluster.
Apache NiFi depends on Apache ZooKeeper which you can run in Kubernetes with the zookeeper:index.adoc.
NiFi is often a good choice as a first step in a data pipeline when it comes to fetching the data in various formats from various sources. The stackablectl::demos/data-lakehouse-iceberg-trino-spark.adoc demo uses NiFi to fetch six different datasets in various formats. The data is then ingested into a Kafka topic. Apache Kafka is also part of the Stackable platform.
The stackablectl::demos/nifi-kafka-druid-earthquake-data.adoc and stackablectl::demos/nifi-kafka-druid-water-level-data.adoc demo use NiFi in the same way, both demos showcase downloading data from web APIs and ingesting it into Kafka.
The Stackable Operator for Apache NiFi currently supports the following versions of NiFi: