This Helm chart deploys the vertica-kafka-scheduler with two modes:
- initializer: Configuration mode. Starts a container so that you can
exec
into it and configure it. - launcher: Launch mode. Launches the vkconfig scheduler. Starts a container that calls
vkconfig launch
automatically. Run this mode after you configure the container ininitializer
mode.
Add the charts to your repo and install the Helm chart. The following helm install
command uses the image.tag
parameter to install version 24.1.0:
$ helm repo add vertica-charts https://vertica.github.io/charts
$ helm repo update
$ helm install vkscheduler vertica-charts/vertica-kafka-scheduler \
--set "image.tag=24.1.0"
The following dropdowns provide sample manifests for a Kafka cluster, VerticaDB operator and custom resource (CR), and vkconfig scheduler. These manifests are applied in Usage to demonstrate how a simple deployment:
kafka-cluster.yaml (with Strimzi operator)
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
namespace: kafka
name: my-cluster
spec:
kafka:
version: 3.6.0
replicas: 1
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: 1
transaction.state.log.replication.factor: 1
transaction.state.log.min.isr: 1
default.replication.factor: 1
min.insync.replicas: 1
inter.broker.protocol.version: "3.6"
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 100Gi
deleteClaim: false
zookeeper:
replicas: 1
storage:
type: persistent-claim
size: 100Gi
deleteClaim: false
entityOperator:
topicOperator: {}
userOperator: {}
vdb-op-cr.yaml
apiVersion: vertica.com/v1
kind: VerticaDB
metadata:
annotations:
vertica.com/include-uid-in-path: "false"
vertica.com/vcluster-ops: "false"
name: vdb-1203
spec:
communal:
credentialSecret: ""
endpoint: https://s3.amazonaws.com
path: s3://<path>/<to>/<s3-bucket>
image: vertica/vertica-k8s:12.0.3-0
initPolicy: Create
subclusters:
- name: sc0
size: 3
type: primary
vertica-kafka-scheduler.yaml
image:
repository: opentext/kafka-scheduler
pullPolicy: IfNotPresent
tag: 12.0.3
launcherEnabled: false
replicaCount: 1
initializerEnabled: true
conf:
generate: true
content:
config-schema: Scheduler
username: dbadmin
dbport: '5433'
enable-ssl: 'false'
dbhost: 10.20.30.40
tls:
enabled: false
serviceAccount:
create: true
The following sections deploy a Kafka cluster and a VerticaDB operator and CR on Kubernetes. Then, they show you how to configure Vertica to consume data from Kafka by setting up the necessary tables and configuring the scheduler. Finally, you launch the scheduler and send data on the command line to test the implementation.
Apply manifests on Kubernetes to create a Kafka cluster, VerticaDB operator, and VerticaDB CR:
-
Create a namespace. The following command creates a namespace named
kafka
:kubectl create namespace kafka
-
Create the Kafka custom resource. Apply the kafka-cluster.yaml manifest:
kubectl apply -f kafka-cluster.yaml
-
Deploy the VerticaDB operator and custom resource. The vdb-op-cr.yaml manifest deploys version 12.0.3. Before you apply the manifest, edit
spec.communal.path
to provide a path to an existing S3 bucket:kubectl apply -f vdb-op-cr.yaml
Create tables and resources so that Vertica can consume data from a Kafka topic:
- Create a Vertica database for Kafka messages:
CREATE FLEX TABLE KafkaFlex();
- Create the Kafka user:
CREATE USER KafkaUser;
- Create a resource pool:
CREATE RESOURCE POOL scheduler_pool PLANNEDCONCURRENCY 1;
Start the Kafka service, and create a Kafka topic that the scheduler can consume data from:
- Open a new shell and start the Kafka producer:
kubectl -namespace kafka run kafka-producer -ti --image=quay.io/strimzi/kafka:0.38.0-kafka-3.6.0 --rm=true --restart=Never -- bash
- Create the Kafka topic that the scheduler subscribes to:
bin/kafka-console-producer.sh --bootstrap-server my-cluster-kafka-bootstrap.kafka:9092 --topic KafkaTopic1
Deploy the scheduler container in initializer mode, and configure the scheduler to consume data from the Kafka topic:
-
Deploy the vertica-kafka-scheduler Helm chart. This manifest has
initializerEnabled
set totrue
so you can configure the vkconfig container before you launch the scheduler:kubectl apply -f vertica-kafka-scheduler.yaml
-
Use
kubectl exec
to get a shell in the initializer pod:kubectl exec -namespace main -it vk1-vertica-kafka-scheduler-initializer -- bash
-
Set configuration options for the scheduler. For descriptions of each of the following options, see vkconfig script options:
# scheduler options vkconfig scheduler --conf /opt/vertica/packages/kafka/config/vkconfig.conf \ --frame-duration 00:00:10 \ --create --operator KafkaUser \ --eof-timeout-ms 2000 \ --config-refresh 00:01:00 \ --new-source-policy START \ --resource-pool scheduler_pool # target options vkconfig target --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \ --target-schema public \ --target-table KafkaFlex # load spec options vkconfig load-spec --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \ --load-spec KafkaSpec \ --parser kafkajsonparser \ --load-method DIRECT \ --message-max-bytes 1000000 # cluster options vkconfig cluster --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \ --cluster KafkaCluster \ --hosts my-cluster-kafka-bootstrap.kafka:9092 # source options vkconfig source --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \ --cluster KafkaCluster \ --source KafkaTopic1 \ --partitions 1 # microbatch options vkconfig microbatch --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \ --microbatch KafkaBatch1 \ --add-source KafkaTopic1 \ --add-source-cluster KafkaCluster \ --target-schema public \ --target-table KafkaFlex \ --rejection-schema public \ --rejection-table KafkaFlex_rej \ --load-spec KafkaSpec
After you configure the scheduler options, you can deploy it in launcher mode:
helm upgrade -namespace main vk1 vertica-charts/vertica-kafka-scheduler \
--set "launcherEnabled=true"
Now that you have a containerized Kafka cluster and VerticaDB CR running, you can test that the scheduler is automatically sending data from the Kafka producer to Vertica:
-
In the terminal that is running your Kafka producer, send sample JSON data:
>{"a": 1} >{"a": 1000}
-
In a different terminal, open
vsql
and query theKafkaFlex
table for the data:=> SELECT compute_flextable_keys_and_build_view('KafkaFlex'); compute_flextable_keys_and_build_view -------------------------------------------------------------------------------------------------------- Please see public.KafkaFlex_keys for updated keys The view public.KafkaFlex_view is ready for querying (1 row) => SELECT a from KafkaFlex_view; a ----- 1 1000 (2 rows)
- affinity
- Applies affinity rules that constrain the scheduler to specific nodes.
- conf.configMapName
- Name of the ConfigMap to use and optionally generate. If omitted, the chart picks a suitable default.
- conf.content
- Set of key-value pairs in the generated ConfigMap. If
conf.generate
isfalse
, this setting is ignored. - conf.generate
- When set to
true
, the Helm chart controls the creation of thevkconfig.conf
ConfigMap. - Default:
true
- fullNameOverride
- Gives the Helm chart full control over the name of the objects that get created. This takes precedence over
nameOverride
. - initializerEnabled
- When set to
true
, the initializer pod is created. This can be used to run any setup tasks needed. - Default:
true
- image.pullPolicy
- How often Kubernetes pulls the image for an object. For details, see Updating Images in the Kubernetes documentation.
- Default:
IfNotPresent
- image.repository
- The image repository and name that contains the Vertica Kafka Scheduler.
- Default:
opentext/kafka-scheduler
- image.tag
- Version of the Vertica Kafka Scheduler. This setting must match the version of the Vertica server that the scheduler connects to.
- Default: Helm chart's
appVersion
- imagePullSecrets
- List of Secrets that contain the required credentials to pull the image.
- launcherEnabled
- When set to
true
, the Helm chart creates the launch deployment. Enable this setting after you configure the scheduler options in the container. - Default:
true
- jvmOpts
- Values to assign to the
VKCONFIG_JVM_OPTS
environment variable in the pods.NOTE You can omit most truststore and keystore settings because they are set by
tls.*
parameters. - nameOverride
- Controls the name of the objects that get created. This is combined with the Helm chart release to form the name.
- nodeSelector
- nodeSelector that controls where the pod is scheduled.
- podAnnotations
- Annotations that you want to attach to the pods.
- podSecurityContext
- Security context for the pods.
- replicaCount
- Number of launch pods that the chart deploys.
- Default: 1
- resources
- Host resources to use for the pod.
- securityContext
- Security context for the container in the pod.
- serviceAccount.annotations
- Annotations to attach to the ServiceAccount.
- serviceAccount.create
- When set to
true
, a ServiceAccount is created as part of the deployment. - Default: true
- serviceAccount.name
- Name of the service account. If this parameter is not set and
serviceAccount.create
is set totrue
, a name is generated using the fullname template. - timezone
- Utilize this to manage the timezone of the logger. As logging employs log4j, ensure you use a Java-friendly timezone ID. Refer to this site for available IDs: https://docs.oracle.com/middleware/1221/wcs/tag-ref/MISC/TimeZones.html
- Default: UTC
- tls.enabled
- When set to
true
, the scheduler is set up for TLS authentication. - Default:
false
- tls.keyStoreMountPath
- Directory name where the keystore is mounted in the pod. This setting controls the name of the keystore within the pod. The full path to the keystore is constructed by combining this parameter and
tls.keyStoreSecretKey
. - tls.keyStorePassword
- Password that protects the keystore. If this setting is omitted, then no password is used.
- tls.keyStoreSecretKey
- Key within
tls.keyStoreSecretName
that is used as the keystore file name. This setting andtls.keyStoreMountPath
form the full path to the key in the pod. - tls.keyStoreSecretName
- Name of an existing Secret that contains the keystore. If this setting is omitted, no keystore information is included.
- tls.trustStoreMountPath
- Directory name where the truststore is mounted in the pod. This setting controls the name of the truststore within the pod. The full path to the truststore is constructed by combining this parameter with
tls.trustStoreSecretKey
. - tls.trustStorePassword
- Password that protects the truststore. If this setting is omitted, then no password is used.
- tls.trustStoreSecretKey
- Key within
tls.trustStoreSecretName
that is used as the truststore file name. This is used withtls.trustStoreMountPath
to form the full path to the key in the pod. - tls.trustStoreSecretName
- Name of an existing Secret that contains the truststore. If this setting is omitted, then no truststore information is included.
- tolerations
- Applies tolerations that control where the pod is scheduled.