diff --git a/README.md b/README.md index d5a1d55c733c..26d22d8779f2 100644 --- a/README.md +++ b/README.md @@ -8,11 +8,11 @@ Click here to watch a screencast of the KSQL demo on YouTube. KSQL screencast

-# Quick Start +# Getting Started If you are ready to see the power of KSQL, try out these: -- [KSQL Quick Start](https://github.com/confluentinc/ksql/tree/v0.4/docs/quickstart#quick-start): Demonstrates a simple workflow using KSQL to write streaming queries against data in Kafka. -- [Clickstream Analysis Demo](https://github.com/confluentinc/ksql/tree/v0.4/ksql-clickstream-demo#clickstream-analysis): Shows how to build an application that performs real-time user analytics. +- [KSQL Quick Start](https://docs.confluent.io/current/ksql/docs/quickstart/): Demonstrates a simple workflow using KSQL to write streaming queries against data in Kafka. +- [Clickstream Analysis Demo](https://docs.confluent.io/current/ksql/docs/ksql-clickstream-demo/): Shows how to build an application that performs real-time user analytics. # Use Cases and Examples @@ -61,7 +61,7 @@ CREATE TABLE error_counts AS # Documentation -You can [find the KSQL documentation here](https://github.com/confluentinc/ksql/tree/v0.4/docs#ksql-documentation) +You can find the KSQL documentation at [docs.confluent.io](https://docs.confluent.io/current/ksql/docs/index.html). # Join the Community Whether you need help, want to contribute, or are just looking for the latest news, you can find out how to [connect with your fellow Confluent community members here](https://www.confluent.io/contact-us-thank-you/). @@ -70,10 +70,13 @@ Whether you need help, want to contribute, or are just looking for the latest ne * Join the [Confluent Google group](https://groups.google.com/forum/#!forum/confluent-platform). # Contributing -Contributions to the code, examples, documentation, etc, are very much appreciated. For more information, see the [contribution guidelines](/docs/contributing.md). +Contributions to the code, examples, documentation, etc, are very much appreciated. For more information, see the [contribution guidelines](contributing.md). - Report issues and bugs directly in [this GitHub project](https://github.com/confluentinc/ksql/issues). +# Issues +Report issues in [this GitHub project](https://github.com/confluentinc/ksql/issues). + # License The project is licensed under the Apache License, version 2.0. diff --git a/docs/contributing.md b/contributing.md similarity index 74% rename from docs/contributing.md rename to contributing.md index 5a0366e4e3cf..384db4dd2e83 100644 --- a/docs/contributing.md +++ b/contributing.md @@ -1,4 +1,8 @@ # Contributing +- [How to Contribute](#how-to-contribute) + - [General Guidelines](#general-guidelines) + - [GitHub Workflow](#github-workflow) +- [Building the docs locally](#building-the-docs-locally) ## How to Contribute @@ -104,6 +108,32 @@ When submitting a pull request (PR), use the following guidelines: git push origin --force feature-xxx ``` - ### Issues +### Building the docs locally - Report issues in [this GitHub project](https://github.com/confluentinc/ksql/issues). \ No newline at end of file +This documentation is built using [Sphinx](http://sphinx-doc.org). It also uses some extensions for theming and REST API +documentation support. + +Follow these instructions to build a local version of the documentation. + +Start by installing the requirements: + + pip install -r requirements.txt + +Then you can generate the HTML version of the docs: + + make html + +The root of the documentation will be at `_build/html/index.html` + +While editing the documentation, you can get a live preview using python-livepreview. Install the Python library: + + pip install livereload + +Then run the monitoring script in the background: + + python autoreload.py & + +If you install the [browser extensions](http://livereload.com/) then everything should update every time any files are +saved without any manual steps on your part. + +Note: If you're already running the autoreloader, you may need to run `make clean html` if you add new sections. \ No newline at end of file diff --git a/docs/Makefile b/docs/Makefile new file mode 100644 index 000000000000..e0722507ee3c --- /dev/null +++ b/docs/Makefile @@ -0,0 +1,181 @@ +# Makefile for Sphinx documentation +# + +# You can set these variables from the command line. +SPHINXOPTS = +SPHINXBUILD = sphinx-build +PAPER = +BUILDDIR = _build + +# User-friendly check for sphinx-build +ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) +$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) +endif + +# Internal variables. +PAPEROPT_a4 = -D latex_paper_size=a4 +PAPEROPT_letter = -D latex_paper_size=letter +ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . +# the i18n builder cannot share the environment and doctrees with the others +I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . + +.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext livehtml + +help: + @echo "Please use \`make ' where is one of" + @echo " html to make standalone HTML files" + @echo " livehtml to make standalone HTML files automatically watching for changes" + @echo " dirhtml to make HTML files named index.html in directories" + @echo " singlehtml to make a single large HTML file" + @echo " pickle to make pickle files" + @echo " json to make JSON files" + @echo " htmlhelp to make HTML files and a HTML help project" + @echo " qthelp to make HTML files and a qthelp project" + @echo " devhelp to make HTML files and a Devhelp project" + @echo " epub to make an epub" + @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" + @echo " latexpdf to make LaTeX files and run them through pdflatex" + @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" + @echo " text to make text files" + @echo " man to make manual pages" + @echo " texinfo to make Texinfo files" + @echo " info to make Texinfo files and run them through makeinfo" + @echo " gettext to make PO message catalogs" + @echo " changes to make an overview of all changed/added/deprecated items" + @echo " xml to make Docutils-native XML files" + @echo " pseudoxml to make pseudoxml-XML files for display purposes" + @echo " linkcheck to check all external links for integrity" + @echo " doctest to run all doctests embedded in the documentation (if enabled)" + +clean: + rm -rf $(BUILDDIR)/* + +html: + $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html + @echo + @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." + +livehtml: + python autoreload.py + +dirhtml: + $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml + @echo + @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." + +singlehtml: + $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml + @echo + @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." + +pickle: + $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle + @echo + @echo "Build finished; now you can process the pickle files." + +json: + $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json + @echo + @echo "Build finished; now you can process the JSON files." + +htmlhelp: + $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp + @echo + @echo "Build finished; now you can run HTML Help Workshop with the" \ + ".hhp project file in $(BUILDDIR)/htmlhelp." + +qthelp: + $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp + @echo + @echo "Build finished; now you can run "qcollectiongenerator" with the" \ + ".qhcp project file in $(BUILDDIR)/qthelp, like this:" + @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/KafkaRESTProxy.qhcp" + @echo "To view the help file:" + @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/KafkaRESTProxy.qhc" + +devhelp: + $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp + @echo + @echo "Build finished." + @echo "To view the help file:" + @echo "# mkdir -p $$HOME/.local/share/devhelp/KafkaRESTProxy" + @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/KafkaRESTProxy" + @echo "# devhelp" + +epub: + $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub + @echo + @echo "Build finished. The epub file is in $(BUILDDIR)/epub." + +latex: + $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex + @echo + @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." + @echo "Run \`make' in that directory to run these through (pdf)latex" \ + "(use \`make latexpdf' here to do that automatically)." + +latexpdf: + $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex + @echo "Running LaTeX files through pdflatex..." + $(MAKE) -C $(BUILDDIR)/latex all-pdf + @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." + +latexpdfja: + $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex + @echo "Running LaTeX files through platex and dvipdfmx..." + $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja + @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." + +text: + $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text + @echo + @echo "Build finished. The text files are in $(BUILDDIR)/text." + +man: + $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man + @echo + @echo "Build finished. The manual pages are in $(BUILDDIR)/man." + +texinfo: + $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo + @echo + @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." + @echo "Run \`make' in that directory to run these through makeinfo" \ + "(use \`make info' here to do that automatically)." + +info: + $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo + @echo "Running Texinfo files through makeinfo..." + make -C $(BUILDDIR)/texinfo info + @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." + +gettext: + $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale + @echo + @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." + +changes: + $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes + @echo + @echo "The overview file is in $(BUILDDIR)/changes." + +linkcheck: + $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck + @echo + @echo "Link check complete; look for any errors in the above output " \ + "or in $(BUILDDIR)/linkcheck/output.txt." + +doctest: + $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest + @echo "Testing of doctests in the sources finished, look at the " \ + "results in $(BUILDDIR)/doctest/output.txt." + +xml: + $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml + @echo + @echo "Build finished. The XML files are in $(BUILDDIR)/xml." + +pseudoxml: + $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml + @echo + @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." diff --git a/docs/README.md b/docs/README.md index bd1006fa8565..8b307afd4e5d 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,33 +1,5 @@ # KSQL Documentation - -| Overview |[Quick Start](/docs/quickstart#quick-start) | [Concepts](/docs/concepts.md#concepts) | [Syntax Reference](/docs/syntax-reference.md#syntax-reference) |[Demo](/ksql-clickstream-demo#clickstream-analysis) | [Examples](/docs/examples.md#examples) | [FAQ](/docs/faq.md#frequently-asked-questions) | -|---|----|-----|----|----|----|----| - -> *Important: This release is a **developer preview** and is free and open-source from Confluent under the Apache 2.0 license. Do not run KSQL against a production cluster.* - -# Overview -KSQL is an open source streaming SQL engine that implements continuous, interactive queries against Apache Kafka™. It allows you to query, read, write, and process data in Apache Kafka in real-time, at scale using SQL commands. KSQL interacts directly with the [Kafka Streams API](https://kafka.apache.org/documentation/streams/), removing the requirement of building a Java app. - -### Use cases -Common KSQL use cases are: - -- Fraud detection - identify and act on out of the ordinary data to provide real-time awareness. -- Personalization - create real-time experiences and insight for end users driven by data. -- Notifications - build custom alerts and messages based on real-time data. -- Real-time Analytics - power real-time dashboards to understand what’s happening as it does. -- Sensor data and IoT - understand and deliver sensor data how and where it needs to be. -- Customer 360 - provide a clear, real-time understanding of your customers across every interaction. - -KSQL lowers the barriers for using real-time data in your applications. It is powered by a scalable streaming platform without the learning curve or additional management complexity of other stream processing solutions. - -## Modes of operation - -You can use KSQL in standalone, client-server, application, and embedded modes. See [Concepts](/docs/concepts.md#concepts) for more information. - -## Getting Started - -* Beginners: Try the [interactive quick start](/docs/quickstart#quick-start). The quick start configures a single instance in a lightweight Docker container or in a Kafka cluster. It demonstrates a simple workflow using KSQL to write streaming queries against data in Kafka. -* Advanced users: Try the [end-to-end Clickstream Analysis demo](/ksql-clickstream-demo#clickstream-analysis). +The KSQL documentation is available on the Confluent Platform documentation site at [docs.confluent.io](https://docs.confluent.io/current/ksql/docs/index.html). ## Interoperability @@ -38,3 +10,5 @@ This table shows the version compatibility matrix of which Kafka clusters can be | Apache Kafka | 0.10.1 and later | 0.10.1 and later | 0.10.1 and later | 0.10.1 and later | | Confluent Platform | 3.1.0 and later | 3.1.0 and later | 3.1.0 and later | 3.1.0 and later | +# Contributing +This documentation is built using [Sphinx](http://sphinx-doc.org). For information on how to contribute, see the [contributing guidelines](contributing.md). \ No newline at end of file diff --git a/docs/concepts.rst b/docs/concepts.rst new file mode 100644 index 000000000000..eb5c8171cf6a --- /dev/null +++ b/docs/concepts.rst @@ -0,0 +1,167 @@ +.. _ksql_concepts: + +Concepts +======== + +.. contents:: + +========== +Components +========== + +The main components of KSQL are the KSQL CLI and the KSQL server. + +KSQL CLI +-------- + +The KSQL CLI allows you to interactively write KSQL queries. Its +interface should be familiar to users of MySQL, Postgres, Oracle, Hive, +Presto, etc. + +The KSQL CLI acts as a client to the KSQL server. + +KSQL Server +----------- + +The KSQL server runs the engine that executes KSQL queries, which +includes the data processing as well as reading data from and writing +data to the target Kafka cluster. + +Servers can run in containers, virtual machines, and bare-metal machines. You can add or remove multiple servers in the +same resource pool to elastically scale query processing in or out. You can use different resource pools to achieve +workload isolation. + +=========== +Terminology +=========== + +When using KSQL, the following terminology is used. + +Stream +------ + +A stream is an unbounded sequence of structured data (“facts”). For +example, we could have a stream of financial transactions such as “Alice +sent $100 to Bob, then Charlie sent $50 to Bob”. Facts in a stream are +immutable, which means new facts can be inserted to a stream, but +existing facts can never be updated or deleted. Streams can be created +from a Kafka topic or derived from an existing table. A stream’s underlying data is durably stored (persisted) within a +Kafka topic on the Kafka brokers. + +Table +----- + +A table is a view of a stream, or another table, and represents a +collection of evolving facts. For example, we could have a table that +contains the latest financial information such as “Bob’s current account +balance is $150”. It is the equivalent of a traditional database table +but enriched by streaming semantics such as windowing. Facts in a table +are mutable, which means new facts can be inserted to the table, and +existing facts can be updated or deleted. Tables can be created from a +Kafka topic or derived from existing streams and tables. In both cases, +a table’s underlying data is durably stored (persisted) within a Kafka +topic on the Kafka brokers. + +.. _modes-of-operation: + +================== +Modes of operation +================== + +Standalone mode +--------------- + +In standalone mode, both the KSQL client and server components are +co-located on the same machine, in the same JVM, and are started +together. This makes standalone mode very convenient for local +development and testing. + +.. image:: img/standalone-mode.png + +To run KSQL in standalone mode: + +- Start the KSQL CLI and the server components all in the same JVM: + + - Start with default settings: + + .. code:: bash + + $ ./bin/ksql-cli local + + - Start with :ref:`custom + settings `, pointing + KSQL at a specific Kafka cluster (see the Streams + `bootstrap.servers ` + setting): + + .. code:: bash + + $ ./bin/ksql-cli local --bootstrap-server kafka-broker-1:9092 \ + --properties-file path/to/ksql-cli.properties + +Client-server mode +------------------ + +In client-server mode, the KSQL servers are run separately from the KSQL CLI client. You can deploy servers on remote machines, +VMs, or containers and then the CLI connects to these remote servers. + +You can add or remove servers from the same resource pool during live operations, to elastically scale query processing. You +can use different resource pools to achieve workload isolation. For example, you can deploy separate pools for production +and for testing. + +.. image:: img/client-server.png + +To run KSQL in client-server mode: + +- Start any number of server nodes: + + - Start with default settings: + + .. code:: bash + + $ ./bin/ksql-server-start + + - Start with :ref:`custom + settings `, pointing + KSQL at a specific Kafka cluster (see Streams :ref:`bootstrap servers ` setting): + + .. code:: bash + + $ hostname + my-ksql-server + + $ cat ksql-server.properties + # You must set at least the following two properties + bootstrap.servers=kafka-broker-1:9092 + # Note: `application.id` is not really needed but you must set it + # because of a known issue in the KSQL Developer Preview + application.id=app-id-setting-is-ignored + + # Optional settings below, only for illustration purposes + # The hostname/port on which the server node will listen for client connections + listeners=http://0.0.0.0:8090 + + To start the server node with the settings above: + + .. code:: bash + + $ ./bin/ksql-server-start ksql-server.properties + +- Start any number of CLIs, specifying the desired KSQL server address + as the ``remote`` endpoint: + + .. code:: bash + + $ ./bin/ksql-cli remote http://my-ksql-server:8090 + +KSQL servers that share the same ``command`` topic belong to the same resource pool. By default, a KSQL server uses the +``ksql__commands`` command topic. To assign a server to a different pool, change the ``ksql.command.topic.suffix`` setting in its configuration: + +.. code:: bash + + # Default value: `commands`. + # + # Changing this to `production_commands` as shown below will result in + # the command topic named `ksql__production_commands` being used. + ksql.command.topic.suffix = production_commands + diff --git a/docs/conf.py b/docs/conf.py new file mode 100644 index 000000000000..7b3287a234ae --- /dev/null +++ b/docs/conf.py @@ -0,0 +1,266 @@ +# -*- coding: utf-8 -*- +# +# KSQL documentation build configuration file, created by +# sphinx-quickstart on Wed Dec 17 14:17:15 2014. +# +# This file is execfile()d with the current directory set to its +# containing dir. +# +# Note that not all possible configuration values are present in this +# autogenerated file. +# +# All configuration values have a default; values that are commented out +# serve to show the default. + +import sys +import os + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +#sys.path.insert(0, os.path.abspath('.')) + +# -- General configuration ------------------------------------------------ + +# If your documentation needs a minimal Sphinx version, state it here. +#needs_sphinx = '1.0' + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = ['sphinx.ext.ifconfig', 'sphinxcontrib.httpdomain'] + +def setup(app): + app.add_config_value('platform_docs', True, 'env') + +# Even if it has a default, these options need to be specified +platform_docs = False + +# Add any paths that contain templates here, relative to this directory. +templates_path = ['_templates'] + +# The suffix of source filenames. +source_suffix = '.rst' + +# The encoding of source files. +#source_encoding = 'utf-8-sig' + +# The master toctree document. +master_doc = 'index' + +# General information about the project. +project = u'KSQL' +copyright = u'2017, Confluent, Inc.' + +# The version info for the project you're documenting, acts as replacement for +# |version| and |release|, also used in various other places throughout the +# built documents. +# +# The short X.Y version. +version = '4.1' +# The full version, including alpha/beta/rc tags. +release = '4.1.0-SNAPSHOT' + +# The language for content autogenerated by Sphinx. Refer to documentation +# for a list of supported languages. +#language = None + +# There are two options for replacing |today|: either, you set today to some +# non-false value, then it is used: +#today = '' +# Else, today_fmt is used as the format for a strftime call. +#today_fmt = '%B %d, %Y' + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +exclude_patterns = ['_build'] + +# The reST default role (used for this markup: `text`) to use for all +# documents. +#default_role = None + +# If true, '()' will be appended to :func: etc. cross-reference text. +#add_function_parentheses = True + +# If true, the current module name will be prepended to all description +# unit titles (such as .. function::). +#add_module_names = True + +# If true, sectionauthor and moduleauthor directives will be shown in the +# output. They are ignored by default. +#show_authors = False + +# The name of the Pygments (syntax highlighting) style to use. +pygments_style = 'sphinx' + +# A list of ignored prefixes for module index sorting. +#modindex_common_prefix = [] + +# If true, keep warnings as "system message" paragraphs in the built documents. +#keep_warnings = False + + +# -- Options for HTML output ---------------------------------------------- + +import sphinx_rtd_theme + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +html_theme = 'sphinx_rtd_theme' + +# Theme options are theme-specific and customize the look and feel of a theme +# further. For a list of options available for each theme, see the +# documentation. +#html_theme_options = {} + +# Add any paths that contain custom themes here, relative to this directory. +html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] + +# The name for this set of Sphinx documents. If None, it defaults to +# " v documentation". +#html_title = None + +# A shorter title for the navigation bar. Default is the same as html_title. +#html_short_title = None + +# The name of an image file (relative to this directory) to place at the top +# of the sidebar. +#html_logo = None + +# The name of an image file (within the static path) to use as favicon of the +# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 +# pixels large. +#html_favicon = None + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +html_static_path = ['_static'] + +# Add any extra paths that contain custom files (such as robots.txt or +# .htaccess) here, relative to this directory. These files are copied +# directly to the root of the documentation. +#html_extra_path = [] + +# If not '', a 'Last updated on:' timestamp is inserted at every page bottom, +# using the given strftime format. +#html_last_updated_fmt = '%b %d, %Y' + +# If true, SmartyPants will be used to convert quotes and dashes to +# typographically correct entities. +#html_use_smartypants = True + +# Custom sidebar templates, maps document names to template names. +#html_sidebars = {} + +# Additional templates that should be rendered to pages, maps page names to +# template names. +#html_additional_pages = {} + +# If false, no module index is generated. +#html_domain_indices = True + +# If false, no index is generated. +#html_use_index = True + +# If true, the index is split into individual pages for each letter. +#html_split_index = False + +# If true, links to the reST sources are added to the pages. +#html_show_sourcelink = True + +# If true, "Created using Sphinx" is shown in the HTML footer. Default is True. +#html_show_sphinx = True + +# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. +#html_show_copyright = True + +# If true, an OpenSearch description file will be output, and all pages will +# contain a tag referring to it. The value of this option must be the +# base URL from which the finished HTML is served. +#html_use_opensearch = '' + +# This is the file name suffix for HTML files (e.g. ".xhtml"). +#html_file_suffix = None + +# Output file base name for HTML help builder. +htmlhelp_basename = 'KSQLRegistryDoc' + + +# -- Options for LaTeX output --------------------------------------------- + +latex_elements = { +# The paper size ('letterpaper' or 'a4paper'). +#'papersize': 'letterpaper', + +# The font size ('10pt', '11pt' or '12pt'). +#'pointsize': '10pt', + +# Additional stuff for the LaTeX preamble. +#'preamble': '', +} + +# Grouping the document tree into LaTeX files. List of tuples +# (source start file, target name, title, +# author, documentclass [howto, manual, or own class]). +latex_documents = [ + ('index', 'KSQL.tex', u'KSQL Documentation', + u'Confluent, Inc.', 'manual'), +] + +# The name of an image file (relative to this directory) to place at the top of +# the title page. +#latex_logo = None + +# For "manual" documents, if this is true, then toplevel headings are parts, +# not chapters. +#latex_use_parts = False + +# If true, show page references after internal links. +#latex_show_pagerefs = False + +# If true, show URL addresses after external links. +#latex_show_urls = False + +# Documents to append as an appendix to all manuals. +#latex_appendices = [] + +# If false, no module index is generated. +#latex_domain_indices = True + + +# -- Options for manual page output --------------------------------------- + +# One entry per manual page. List of tuples +# (source start file, name, description, authors, manual section). +# man_pages = [ +# ('index', 'schemaregistry', u'Schema Registry Documentation', +# [u'Confluent, Inc.'], 1) +# ] + +# If true, show URL addresses after external links. +#man_show_urls = False + + +# -- Options for Texinfo output ------------------------------------------- + +# Grouping the document tree into Texinfo files. List of tuples +# (source start file, target name, title, author, +# dir menu entry, description, category) +# texinfo_documents = [ +# ('index', 'SchemaRegistry', u'Schema Registry Documentation', +# u'Confluent, Inc.', 'SchemaRegistry', 'One line description of project.', +# 'Miscellaneous'), +# ] + +# Documents to append as an appendix to all manuals. +#texinfo_appendices = [] + +# If false, no module index is generated. +#texinfo_domain_indices = True + +# How to display URL addresses: 'footnote', 'no', or 'inline'. +#texinfo_show_urls = 'footnote' + +# If true, do not generate a @detailmenu in the "Top" node's menu. +#texinfo_no_detailmenu = False diff --git a/docs/examples.rst b/docs/examples.rst new file mode 100644 index 000000000000..28eadd0b13e3 --- /dev/null +++ b/docs/examples.rst @@ -0,0 +1,308 @@ +.. _ksql_examples: + +KSQL Examples +============= + +These examples use a ``pageviews`` stream and a ``users`` table. + +.. contents:: Contents + :local: + :depth: 1 + + +Creating streams +---------------- + +Prerequisite: + The corresponding Kafka topics must already exist in your Kafka cluster. + +Create a stream with three columns on the Kafka topic that is named ``pageviews``. It is important to instruct KSQL the format +of the values that are stored in the topic. In this example, the values format is ``DELIMITED``. + +.. code:: sql + + CREATE STREAM pageviews \ + (viewtime BIGINT, \ + userid VARCHAR, \ + pageid VARCHAR) \ + WITH (kafka_topic='pageviews-topic', \ + value_format='DELIMITED'); + +**Associating Kafka message keys:** The above statement does not make +any assumptions about the Kafka message key in the underlying Kafka +topic. However, if the value of the message key in Kafka is the same as +one of the columns defined in the stream in KSQL, you can provide such +information in the WITH clause. For instance, if the Kafka message key +has the same value as the ``pageid`` column, you can write the CREATE +STREAM statement as follows: + +.. code:: sql + + CREATE STREAM pageviews \ + (viewtime BIGINT, \ + userid VARCHAR, \ + pageid VARCHAR) \ + WITH (kafka_topic='pageviews-topic', \ + value_format='DELIMITED', \ + key='pageid'); + +**Associating Kafka message timestamps:** If you want to use the value +of one of the columns as the Kafka message timestamp, you can provide +such information to KSQL in the WITH clause. The message timestamp is +used in window-based operations in KSQL (such as windowed aggregations) +and to support event-time based processing in KSQL. For instance, if you +want to use the value of the ``viewtime`` column as the message +timestamp, you can rewrite the above statement as follows: + +.. code:: sql + + CREATE STREAM pageviews \ + (viewtime BIGINT, \ + userid VARCHAR, \ + pageid VARCHAR) \ + WITH (kafka_topic='pageviews-topic', \ + value_format='DELIMITED', \ + key='pageid', \ + timestamp='viewtime'); + +Creating tables +--------------- + +Prerequisite: + The corresponding Kafka topics must already exist in your Kafka cluster. + +Create a table with several columns. In this example, the table has columns with primitive data +types, a column of ``array`` type, and a column of ``map`` type: + +.. code:: sql + + CREATE TABLE users \ + (registertime BIGINT, \ + gender VARCHAR, \ + regionid VARCHAR, \ + userid VARCHAR, \ + interests array, \ + contact_info map) \ + WITH (kafka_topic='users-topic', \ + key = 'userid',\ + value_format='JSON'); + + + +Working with streams and tables +------------------------------- + +Now that you have the ``pageviews`` stream and ``users`` table, take a +look at some example queries that you can write in KSQL. The focus is on +two types of KSQL statements: CREATE STREAM AS SELECT and CREATE TABLE +AS SELECT. For these statements KSQL persists the results of the query +in a new stream or table, which is backed by a Kafka topic. + +Transforming +~~~~~~~~~~~~ + +For this example, imagine you want to create a new stream by +transforming ``pageviews`` in the following way: + +- The ``viewtime`` column value is used as the Kafka message timestamp + in the new stream’s underlying Kafka topic. +- The new stream’s Kafka topic has 5 partitions. +- The data in the new stream is in JSON format. +- Add a new column that shows the message timestamp in human-readable + string format. +- The ``userid`` column is the key for the new stream. + +The following statement will generate a new stream, +``pageviews_transformed`` with the above properties: + +.. code:: sql + + CREATE STREAM pageviews_transformed \ + WITH (timestamp='viewtime', \ + partitions=5, \ + value_format='JSON') AS \ + SELECT viewtime, \ + userid, \ + pageid, \ + TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS') AS timestring \ + FROM pageviews \ + PARTITION BY userid; + +Use a ``[ WHERE condition ]`` clause to select a subset of data. If you +want to route streams with different criteria to different streams +backed by different underlying Kafka topics, e.g. content-based routing, +write multiple KSQL statements as follows: + +.. code:: sql + + CREATE STREAM pageviews_transformed_priority_1 \ + WITH (timestamp='viewtime', \ + partitions=5, \ + value_format='JSON') AS \ + SELECT viewtime, \ + userid, \ + pageid, \ + TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS') AS timestring \ + FROM pageviews \ + WHERE userid='User_1' OR userid='User_2' \ + PARTITION BY userid; + +.. code:: sql + + CREATE STREAM pageviews_transformed_priority_2 \ + WITH (timestamp='viewtime', \ + partitions=5, \ + value_format='JSON') AS \ + SELECT viewtime, \ + userid, \ + pageid, \ + TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS') AS timestring \ + FROM pageviews \ + WHERE userid<>'User_1' AND userid<>'User_2' \ + PARTITION BY userid; + +Joining +~~~~~~~ + +The following query creates a new stream by joining the +``pageviews_transformed`` stream with the ``users`` table: + +.. code:: sql + + CREATE STREAM pageviews_enriched AS \ + SELECT pv.viewtime, \ + pv.userid AS userid, \ + pv.pageid, \ + pv.timestring, \ + u.gender, \ + u.regionid, \ + u.interests, \ + u.contact_info \ + FROM pageviews_transformed pv \ + LEFT JOIN users u ON pv.userid = users.userid; + +Note that by default all the Kafka topics will be read from the current +offset (aka the latest available data); however, in a stream-table join, +the table topic will be read from the beginning. + +Aggregating, windowing, and sessionization +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Now assume that you want to count the number of pageviews per region. +Here is the query that would perform this count: + +.. code:: sql + + CREATE TABLE pageviews_per_region AS \ + SELECT regionid, \ + count(*) \ + FROM pageviews_enriched \ + GROUP BY regionid; + +The above query counts the pageviews from the time you start the query +until you terminate the query. Note that we used CREATE TABLE AS SELECT +statement here since the result of the query is a KSQL table. The +results of aggregate queries in KSQL are always a table because it +computes the aggregate for each key (and possibly for each window per +key) and *updates* these results as it processes new input data. + +KSQL supports aggregation over WINDOW too. Let’s rewrite the above query +so that we compute the pageview count per region every 1 minute: + +.. code:: sql + + CREATE TABLE pageviews_per_region_per_minute AS \ + SELECT regionid, \ + count(*) \ + FROM pageviews_enriched \ + WINDOW TUMBLING (SIZE 1 MINUTE) \ + GROUP BY regionid; + +If you want to count the pageviews for only “Region_6” by female users +for every 30 seconds, you can change the above query as the following: + +.. code:: sql + + CREATE TABLE pageviews_per_region_per_30secs AS \ + SELECT regionid, \ + count(*) \ + FROM pageviews_enriched \ + WINDOW TUMBLING (SIZE 30 SECONDS) \ + WHERE UCASE(gender)='FEMALE' AND LCASE(regionid)='region_6' \ + GROUP BY regionid; + +UCASE and LCASE functions in KSQL are used to convert the values of +gender and regionid columns to upper and lower case, so that you can +match them correctly. KSQL also supports LIKE operator for prefix, +suffix and substring matching. + +KSQL supports HOPPING windows and SESSION windows too. The following +query is the same query as above that computes the count for hopping +window of 30 seconds that advances by 10 seconds: + +.. code:: sql + + CREATE TABLE pageviews_per_region_per_30secs10secs AS \ + SELECT regionid, \ + count(*) \ + FROM pageviews_enriched \ + WINDOW HOPPING (SIZE 30 SECONDS, ADVANCE BY 10 SECONDS) \ + WHERE UCASE(gender)='FEMALE' AND LCASE (regionid) LIKE '%_6' \ + GROUP BY regionid; + +The next statement counts the number of pageviews per region for session +windows with a session inactivity gap of 60 seconds. In other words, you +are *sessionizing* the input data and then perform the +counting/aggregation step per region. + +.. code:: sql + + CREATE TABLE pageviews_per_region_per_session AS \ + SELECT regionid, \ + count(*) \ + FROM pageviews_enriched \ + WINDOW SESSION (60 SECONDS) \ + GROUP BY regionid; + +Working with arrays and maps +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``interests`` column in the ``users`` table is an ``array`` of +strings that represents the interest of each user. The ``contact_info`` +column is a string-to-string ``map`` that represents the following +contact information for each user: phone, city, state, and zipcode. + +The following query will create a new stream from ``pageviews_enriched`` +that includes the first interest of each user along with the city and +zipcode for each user: + +.. code:: sql + + CREATE STREAM pageviews_interest_contact AS \ + SELECT interests[0] AS first_interest, \ + contact_info['zipcode'] AS zipcode, \ + contact_info['city'] AS city, \ + viewtime, \ + userid, \ + pageid, \ + timestring, \ + gender, \ + regionid \ + FROM pageviews_enriched; + +Running KSQL +------------ + +KSQL supports various :ref:`modes of +operation `, including a standalone +mode and a client-server mode. + +Additionally, you can also instruct KSQL to execute a single statement +from the command line. The following example command runs the given +``SELECT`` statement and show the results in the terminal. In this +particular case, the query will run until 5 records have been found, and +then terminate. + +.. code:: shell + + $ ksql-cli local --exec "SELECT * FROM pageviews LIMIT 5;" diff --git a/docs/faq.rst b/docs/faq.rst new file mode 100644 index 000000000000..fa386bf95201 --- /dev/null +++ b/docs/faq.rst @@ -0,0 +1,131 @@ +.. _ksql_faq: + +Frequently Asked Questions +========================== + +.. contents:: Contents + :local: + :depth: 1 + +============================== +What are the benefits of KSQL? +============================== + +KSQL allows you to query, read, write, and process data in Apache Kafka +in real-time and at scale using intuitive SQL-like syntax. KSQL does not +require proficiency with a programming language such as Java or Scala, +and you don’t have to install a separate processing cluster technology. + +============================================ +What are the technical requirements of KSQL? +============================================ + +KSQL only requires: + +1. A Java runtime environment +2. Access to an Apache Kafka cluster for reading and writing data in + real-time. The cluster can be on-premises or in the cloud. KSQL works + with clusters running vanilla Apache Kafka as well as with clusters + running the Kafka versions included in Confluent Platform. + +We recommend the use of `Confluent +Platform `__ or +`Confluent Cloud `__ for +running Apache Kafka. + +================================================ +Is KSQL owned by the Apache Software Foundation? +================================================ + +No, KSQL is owned and maintained by `Confluent +Inc. `__ as part of its free `Confluent Open +Source `__ +product. + +==================================================== +How does KSQL compare to Apache Kafka’s Streams API? +==================================================== + +KSQL is complementary to the Kafka Streams API, and indeed executes +queries through Kafka Streams applications. One of the key benefits of +KSQL is that it does not require the user to develop any code in Java or +Scala. This enables users to use a SQL-like interface alone to construct +streaming ETL pipelines, as well as responding to a real-time, +continuous business requests. For full-fledged stream processing +applications Kafka Streams remains a more appropriate choice. As with +many technologies, each has its sweet-spot based on technical +requirements, mission-criticality, and user skillset. + +======================================================================================================================= +Does KSQL work with vanilla Apache Kafka clusters, or does it require the Kafka version included in Confluent Platform? +======================================================================================================================= + +KSQL works with both vanilla Apache Kafka clusters as well as with the +Kafka versions included in Confluent Platform. + +============================================================ +Does KSQL support Kafka’s exactly-once processing semantics? +============================================================ + +Yes, KSQL supports exactly-once processing, which means it will compute +correct results even in the face of failures such as machine crashes. + +============================= +Is KSQL ready for production? +============================= + +KSQL is a technical preview at this point in time. We do not yet +recommend its use for production purposes. + +============================================================== +Can I use KSQL with my favorite data format (e.g. JSON, Avro)? +============================================================== + +KSQL currently supports formats: + +- DELIMITED (e.g. CSV) +- JSON + +*Support for Apache Avro is expected soon.* + +==================================== +Is KSQL fully compliant to ANSI SQL? +==================================== + +KSQL is a dialect inspired by ANSI SQL. It has some differences because +it is geared at processing streaming data. For example, ANSI SQL has no +notion of “windowing” for use cases such as performing aggregations on +data grouped into 5-minute windows, which is a commonly required +functionality in the streaming world. + +===================================== +How do I shutdown a KSQL environment? +===================================== + +- To stop DataGen tasks that were started with the ``-daemon`` flag + (cf. :ref:`ksql_clickstream`). + + .. code:: bash + + $ jps | grep DataGen + 25379 DataGen + $ kill 25379 + +- Exit KSQL. + + .. code:: bash + + ksql> exit + +- Stop Confluent Platform by shutting down all services including + Kafka. + + .. code:: bash + + $ confluent stop + +- To remove all data, topics, and streams: + + .. code:: bash + + $ confluent destroy diff --git a/docs/img/grafana-success.png b/docs/img/grafana-success.png new file mode 100644 index 000000000000..1d43171be8cf Binary files /dev/null and b/docs/img/grafana-success.png differ diff --git a/docs/quickstart/ksql-quickstart-schemas.jpg b/docs/img/ksql-quickstart-schemas.jpg similarity index 100% rename from docs/quickstart/ksql-quickstart-schemas.jpg rename to docs/img/ksql-quickstart-schemas.jpg diff --git a/docs/index.rst b/docs/index.rst new file mode 100644 index 000000000000..d8f86da0e947 --- /dev/null +++ b/docs/index.rst @@ -0,0 +1,65 @@ +.. _ksql_home: + +KSQL +==== + +.. toctree:: + :maxdepth: 3 + :hidden: + + quickstart/index + installation/index + concepts + syntax-reference + ksql-clickstream-demo/index + examples + faq + +.. important:: + + This release is a developer preview. It is strongly recommended that you test before running KSQL against a production Kafka cluster. + +KSQL is an open source streaming SQL engine for Apache Kafka™. It provides a simple interactive SQL interface for stream +processing on Kafka, without the need to write code in a programming language such as Java or Python. KSQL is scalable, reliable, +and real-time. It supports a wide range of streaming operations, including aggregations, joins, windowing, and sessionization. + +=============== +Getting started +=============== + +----------- +Quick Start +----------- +Create a simple workflow using KSQL and write streaming queries against data in Kafka with the :ref:`ksql_quickstart`. + +------------------------- +Clickstream Analysis Demo +------------------------- +Learn how to analyze data feeds and build a real-time dashboard for reporting and alerting with the :ref:`ksql_clickstream`. + +--------------- +KSQL Screencast +--------------- +Watch `a screencast of the KSQL demo `_ on YouTube. + +.. raw:: html + +
+ +
+ + +--------- +Use cases +--------- + +Common KSQL use cases are: + +- Streaming ETL: Apache Kafka is a popular choice for powering data pipelines. KSQL makes it simple to transform data within the pipeline, readying messages to cleanly land in another system. +- Real-time Monitoring and Analytics: track, understand, and manage infrastructure, applications, and data feeds by quickly building real-time dashboards, generating metrics, and creating custom alerts and messages. +- Data exploration and discovery: navigate and browse through your data in Kafka. +- Anomaly detection: identify patterns and spot anomalies in real-time data with millisecond latency, allowing you to properly surface out of the ordinary events and to handle fraudulent activities separately. +- Personalization: create data driven real-time experiences and insight for users. +- Sensor data and IoT: understand and deliver sensor data how and where it needs to be. +- Customer 360-view: achieve a comprehensive understanding of your customers across every interaction through a variety of channels, where new information is continuously incorporated in real-time. + diff --git a/docs/installation/config-ksql.rst b/docs/installation/config-ksql.rst new file mode 100644 index 000000000000..29f9123086ca --- /dev/null +++ b/docs/installation/config-ksql.rst @@ -0,0 +1,79 @@ +.. _configuring-ksql: + +Configuring KSQL +================ + +You can set configuration properties for KSQL and your queries with the +SET statement. This includes :cp-javadoc:`settings for Kafka’s Streams +API |streams/javadocs/index.html` (e.g., +``cache.max.bytes.buffering``) as well as settings for Kafka’s :cp-javadoc:`producer +client |clients/javadocs/org/apache/kafka/clients/producer/ProducerConfig.html` and +:cp-javadoc:`consumer +client |clients/javadocs/org/apache/kafka/clients/consumer/ConsumerConfig.html` +(e.g., ``auto.offset.reset``). + +The basic syntax is: + +.. code:: sql + + SET ''=''; + +The property name and value must be enclosed in single quotes. + +**Tip:** You can view the current settings with the ``SHOW PROPERTIES`` command. + +After a property has been set, it remains in effect for the remainder +of the KSQL CLI session, or until you issue another SET statement to change +it. + +.. caution:: + When using KSQL in Docker, the properties file must be available inside the Docker container. If you + don’t want to customize your Docker setup so that it contains an appropriate properties file, you can use the SET statement. + +Examples +-------- + +Common configuration properties that you might want to change from their +default values include: + +- :cp-javadoc:`auto.offset.reset |clients/javadocs/org/apache/kafka/clients/consumer/ConsumerConfig.html#AUTO_OFFSET_RESET_CONFIG`: + The default value in KSQL is ``latest`` meaning all the Kafka topics + will be read from the current offset (aka latest available data). You + can change it using the following statement: + + .. code:: bash + + ksql> SET 'auto.offset.reset'='earliest'; + +- :cp-javadoc:`commit.interval.ms |streams/javadocs/org/apache/kafka/streams/StreamsConfig.html#COMMIT_INTERVAL_MS_CONFIG`: + The default value in KSQL is ``2000``. Here is an example to change + the value to ``5000``: + + .. code:: bash + + ksql> SET 'commit.interval.ms'='5000'; + +- :cp-javadoc:`cache.max.bytes.buffering |streams/javadocs/org/apache/kafka/streams/StreamsConfig.html#CACHE_MAX_BYTES_BUFFERING_CONFIG`: + The default value in KSQL is ``10000000`` (~ 10 MB). Here is an example to change the value to ``20000000``: + + .. code:: bash + + ksql> SET 'cache.max.bytes.buffering'='20000000'; + + +You can also use a properties file instead of the SET statement. The +syntax of properties files follow Java conventions, which are slightly +different to the syntax of the SET statement above. + +.. code:: bash + + # Show the example contents of a properties file + $ cat ksql.properties + auto.offset.reset=earliest + + # Start KSQL in standalone mode with the custom properties above + $ ksql-cli local --properties-file ./ksql.properties + + # Start a KSQL server node (for client-server mode) with the custom properties above + $ ksql-server-start ./ksql.properties + diff --git a/docs/installation/index.rst b/docs/installation/index.rst new file mode 100644 index 000000000000..b9381fdf9102 --- /dev/null +++ b/docs/installation/index.rst @@ -0,0 +1,23 @@ +.. install_ksql: + +Installing KSQL +--------------- + +.. toctree:: + :maxdepth: 3 + :hidden: + + config-ksql + +---------------- +Interoperability +---------------- + ++--------------------+------------------+------------------+ +| KSQL | 0.1 | 0.2 | ++====================+==================+==================+ +| Apache Kafka | 0.10.1 and later | 0.10.1 and later | ++--------------------+------------------+------------------+ +| Confluent Platform | 3.1.0 and later | 3.1.0 and later | ++--------------------+------------------+------------------+ + diff --git a/docs/ksql-clickstream-demo/Dockerfile b/docs/ksql-clickstream-demo/Dockerfile new file mode 100644 index 000000000000..06d8de88427e --- /dev/null +++ b/docs/ksql-clickstream-demo/Dockerfile @@ -0,0 +1,31 @@ +# https://confluentinc.atlassian.net/browse/KSQL-292 + +ARG DOCKER_REGISTRY + +FROM ${DOCKER_REGISTRY}confluentinc/docker-demo-base:3.3.1 + +ARG KSQL_VERSION +ARG ARTIFACT_ID + +EXPOSE 3000 + +ENV ES_JAVA_OPTS="-Xms512M -Xmx512M" +ENV KSQL_CLASSPATH=/usr/share/java/${ARTIFACT_ID}/${ARTIFACT_ID}-${KSQL_VERSION}-standalone.jar +ENV KSQL_CONFIG_DIR="/etc/ksql" +ENV KSQL_LOG4J_OPTS="-Dlog4j.configuration=file:/etc/ksql/log4j-rolling.properties" + +RUN wget -q https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana_4.4.3_amd64.deb \ + && wget -q https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.2.deb \ + && dpkg -i grafana_4.4.3_amd64.deb \ + && dpkg -i elasticsearch-5.5.2.deb \ + && rm grafana_4.4.3_amd64.deb \ + && rm elasticsearch-5.5.2.deb + +ADD target/${ARTIFACT_ID}-${KSQL_VERSION}-standalone.jar /usr/share/java/${ARTIFACT_ID}/${ARTIFACT_ID}-${KSQL_VERSION}-standalone.jar +ADD target/${ARTIFACT_ID}-${KSQL_VERSION}-package/bin/* /usr/bin/ +ADD target/${ARTIFACT_ID}-${KSQL_VERSION}-package/etc/* /etc/ksql/ + +ADD demo/*sh /usr/share/doc/ksql-clickstream-demo/ +ADD demo/*sql /usr/share/doc/ksql-clickstream-demo/ +ADD demo/*json /usr/share/doc/ksql-clickstream-demo/ +ADD demo/connect-config/null-filter-4.0.0-SNAPSHOT.jar /usr/share/java/kafka-connect-elasticsearch/ diff --git a/docs/ksql-clickstream-demo/docker-clickstream.rst b/docs/ksql-clickstream-demo/docker-clickstream.rst new file mode 100644 index 000000000000..a05ea2e628cc --- /dev/null +++ b/docs/ksql-clickstream-demo/docker-clickstream.rst @@ -0,0 +1,381 @@ +.. _ksql_clickstream_docker: + +Clickstream Analysis with KSQL running in Docker +================================================ + +These steps will guide you through how to setup your environment and run +the clickstream analysis demo from a Docker container. For instructions +without using Docker, see :ref:`this +documentation `. + +Prerequisites +------------- + +- Docker + + - `macOS `__ + - `All platforms `__ + + **Important:** Docker should be configured to run with 5 GB of memory (the default is 2 GB). + +- `Git `__ + +1. Download and start the KSQL clickstream container. This container + image is large and contains Confluent, Grafana, and Elasticsearch. + Depending on your network speed, this may take up to 10-15 minutes. + The ``-p`` flag will forward the Grafana dashboard to port 33000 on + your local host. + + .. code:: bash + + $ docker run -p 33000:3000 -it confluentinc/ksql-clickstream-demo bash + + Your output should resemble: + + .. code:: bash + + Unable to find image 'confluentinc/ksql-clickstream-demo:latest' locally + latest: Pulling from confluentinc/ksql-clickstream-demo + ad74af05f5a2: Already exists + d02e292e7b5e: Already exists + 8de7f5c81ab0: Already exists + ed0b76dc2730: Already exists + cfc44fa8a002: Already exists + d9ece951ea0c: Pull complete + f26010779356: Pull complete + c9dad5440731: Pull complete + 935591799d9d: Pull complete + 696df0f65482: Pull complete + 14fd98e52325: Pull complete + fcbeb94bace2: Pull complete + 32cca4f1567d: Pull complete + 5df0d25e7260: Pull complete + e16097edc4fc: Pull complete + 72b33b348958: Pull complete + 015da01a41b0: Pull complete + 80e29f47abe0: Pull complete + Digest: sha256:f3b2b19668b851d1300f77aa8c2236a126b628b911578cc688c7e0de442c1cd3 + Status: Downloaded newer image for confluentinc/ksql-clickstream-demo:latest + root@d98186dd8d6c:/# + + You should now be in the Docker container. + +2. Start the container services. + + - Elasticsearch + + .. code:: bash + + $ /etc/init.d/elasticsearch start + + - Grafana + + .. code:: bash + + $ /etc/init.d/grafana-server start + + - Confluent Platform + + .. code:: bash + + $ confluent start + +3. From your terminal, create the clickStream data using the + ksql-datagen utility. This stream will run continuously until you + terminate. + + **Tip:** Because of shell redirection, this command does not print a + newline and so it might look like it’s still in the foreground. The + process is running as a daemon, so just press return again to see + the shell prompt. + + .. code:: bash + + $ ksql-datagen -daemon quickstart=clickstream format=json topic=clickstream maxInterval=100 iterations=500000 + + Your output should resemble: + + :: + + Writing console output to /tmp/ksql-logs/ksql.out + +4. From your terminal, create the status codes using the ksql-datagen + utility. This stream runs once to populate the table. + + .. code:: bash + + $ ksql-datagen quickstart=clickstream_codes format=json topic=clickstream_codes maxInterval=20 iterations=100 + + Your output should resemble: + + :: + + 200 --> ([ 200 | 'Successful' ]) + 302 --> ([ 302 | 'Redirect' ]) + 200 --> ([ 200 | 'Successful' ]) + 406 --> ([ 406 | 'Not acceptable' ]) + ... + +5. From your terminal, create a set of users using ksql-datagen + utility. This stream runs once to populate the table. + + .. code:: bash + + $ ksql-datagen quickstart=clickstream_users format=json topic=clickstream_users maxInterval=10 iterations=1000 + + Your output should resemble: + + :: + + 1 --> ([ 1 | 'GlenAlan_23344' | 1424796387808 | 'Curran' | 'Lalonde' | 'Palo Alto' | 'Gold' ]) + 2 --> ([ 2 | 'ArlyneW8ter' | 1433932319457 | 'Oriana' | 'Vanyard' | 'London' | 'Platinum' ]) + 3 --> ([ 3 | 'akatz1022' | 1478233258664 | 'Ferd' | 'Trice' | 'Palo Alto' | 'Platinum' ]) + ... + +6. Launch the KSQL CLI in local mode. + + 1. Start the KSQL server. + + .. code:: bash + + $ ksql-server-start /etc/ksql/ksqlserver.properties > /tmp/ksql-logs/ksql-server.log 2>&1 & + + 2. Start the CLI on port 8080. + + .. code:: bash + + $ ksql-cli remote http://localhost:8080 + + You should now be in the KSQL CLI. + + .. code:: bash + + ====================================== + = _ __ _____ ____ _ = + = | |/ // ____|/ __ \| | = + = | ' /| (___ | | | | | = + = | < \___ \| | | | | = + = | . \ ____) | |__| | |____ = + = |_|\_\_____/ \___\_\______| = + = = + = Streaming SQL Engine for Kafka = + Copyright 2017 Confluent Inc. + + CLI v0.2, Server v0.1 located at http://localhost:9098 + + Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! + + ksql> + +7. From the the KSQL CLI, load the ``clickstream.sql`` schema file that + will run the demo app. + + **Important:** Before running this step, you must have already run + ksql-datagen utility to create the clickstream data, status codes, + and set of users. + + :: + + ksql> RUN SCRIPT '/usr/share/doc/ksql-clickstream-demo/clickstream-schema.sql'; + + The output should resemble: + + :: + + Message + ------------------------------------ + Executing statement + ksql> + +8. From the the KSQL CLI, verify that the tables are created. + + :: + + ksql> LIST TABLES; + + Your output should resemble: + + :: + + Table Name | Kafka Topic | Format | Windowed + ----------------------------------------------------------------------------- + WEB_USERS | clickstream_users | JSON | false + ERRORS_PER_MIN_ALERT | ERRORS_PER_MIN_ALERT | JSON | true + CLICKSTREAM_CODES_TS | CLICKSTREAM_CODES_TS | JSON | false + USER_IP_ACTIVITY | USER_IP_ACTIVITY | JSON | true + CLICKSTREAM_CODES | clickstream_codes | JSON | false + PAGES_PER_MIN | PAGES_PER_MIN | JSON | true + CLICK_USER_SESSIONS | CLICK_USER_SESSIONS | JSON | true + ENRICHED_ERROR_CODES_COUNT | ENRICHED_ERROR_CODES_COUNT | JSON | true + EVENTS_PER_MIN_MAX_AVG | EVENTS_PER_MIN_MAX_AVG | JSON | true + ERRORS_PER_MIN | ERRORS_PER_MIN | JSON | true + EVENTS_PER_MIN | EVENTS_PER_MIN | JSON | true + +9. From the the KSQL CLI, verify that the streams are created. + + :: + + ksql> LIST STREAMS; + + Your output should resemble: + + :: + + Stream Name | Kafka Topic | Format + ---------------------------------------------------------------- + USER_CLICKSTREAM | USER_CLICKSTREAM | JSON + EVENTS_PER_MIN_MAX_AVG_TS | EVENTS_PER_MIN_MAX_AVG_TS | JSON + ERRORS_PER_MIN_TS | ERRORS_PER_MIN_TS | JSON + EVENTS_PER_MIN_TS | EVENTS_PER_MIN_TS | JSON + ENRICHED_ERROR_CODES | ENRICHED_ERROR_CODES | JSON + ERRORS_PER_MIN_ALERT_TS | ERRORS_PER_MIN_ALERT_TS | JSON + CLICK_USER_SESSIONS_TS | CLICK_USER_SESSIONS_TS | JSON + PAGES_PER_MIN_TS | PAGES_PER_MIN_TS | JSON + ENRICHED_ERROR_CODES_TS | ENRICHED_ERROR_CODES_TS | JSON + USER_IP_ACTIVITY_TS | USER_IP_ACTIVITY_TS | JSON + CUSTOMER_CLICKSTREAM | CUSTOMER_CLICKSTREAM | JSON + CLICKSTREAM | clickstream | JSON + +10. From the the KSQL CLI, verify that data is being streamed through + various tables and streams. + + **View clickstream data** + + .. code:: bash + + ksql> SELECT * FROM CLICKSTREAM LIMIT 5; + + Your output should resemble: + + .. code:: bash + + 1503585407989 | 222.245.174.248 | 1503585407989 | 24/Aug/2017:07:36:47 -0700 | 233.90.225.227 | GET /site/login.html HTTP/1.1 | 407 | 19 | 4096 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) + 1503585407999 | 233.168.257.122 | 1503585407999 | 24/Aug/2017:07:36:47 -0700 | 233.173.215.103 | GET /site/user_status.html HTTP/1.1 | 200 | 15 | 14096 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) + 1503585408009 | 222.168.57.122 | 1503585408009 | 24/Aug/2017:07:36:48 -0700 | 111.249.79.93 | GET /images/track.png HTTP/1.1 | 406 | 22 | 4096 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) + 1503585408019 | 122.145.8.244 | 1503585408019 | 24/Aug/2017:07:36:48 -0700 | 122.249.79.233 | GET /site/user_status.html HTTP/1.1 | 404 | 6 | 4006 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) + 1503585408029 | 222.152.45.45 | 1503585408029 | 24/Aug/2017:07:36:48 -0700 | 222.249.79.93 | GET /images/track.png HTTP/1.1 | 200 | 29 | 14096 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 + LIMIT reached for the partition. + Query terminated + + **View the events per minute** + + .. code:: bash + + ksql> SELECT * FROM EVENTS_PER_MIN_TS LIMIT 5; + + Your output should resemble: + + .. code:: bash + + 1503585450000 | 29^�8 | 1503585450000 | 29 | 19 + 1503585450000 | 37^�8 | 1503585450000 | 37 | 25 + 1503585450000 | 8^�8 | 1503585450000 | 8 | 35 + 1503585450000 | 36^�8 | 1503585450000 | 36 | 14 + 1503585450000 | 24^�8 | 1503585450000 | 24 | 22 + LIMIT reached for the partition. + Query terminated + + **View pages per minute** + + .. code:: bash + + ksql> SELECT * FROM PAGES_PER_MIN LIMIT 5; + + Your output should resemble: + + .. code:: bash + + 1503585475000 | 4 : Window{start=1503585475000 end=-} | 4 | 14 + 1503585480000 | 25 : Window{start=1503585480000 end=-} | 25 | 9 + 1503585480000 | 16 : Window{start=1503585480000 end=-} | 16 | 6 + 1503585475000 | 25 : Window{start=1503585475000 end=-} | 25 | 20 + 1503585480000 | 37 : Window{start=1503585480000 end=-} | 37 | 6 + LIMIT reached for the partition. + Query terminated + +11. Go to your terminal and send the KSQL tables to Elasticsearch and + Grafana. + + 1. Exit the KSQL CLI with ``CTRL+D``. + + 2. From your terminal, navigate to the demo directory: + + .. code:: bash + + $ cd /usr/share/doc/ksql-clickstream-demo/ + + 3. Run this command to send the KSQL tables to Elasticsearch and + Grafana: + + .. code:: bash + + $ ./ksql-tables-to-grafana.sh + + Your output should resemble: + + :: + + Loading Clickstream-Demo TABLES to Confluent-Connect => Elastic => Grafana datasource + Logging to: /tmp/ksql-connect.log + Charting CLICK_USER_SESSIONS_TS + Charting USER_IP_ACTIVITY_TS + Charting CLICKSTREAM_STATUS_CODES_TS + Charting ENRICHED_ERROR_CODES_TS + Charting ERRORS_PER_MIN_ALERT_TS + Charting ERRORS_PER_MIN_TS + Charting EVENTS_PER_MIN_MAX_AVG_TS + Charting EVENTS_PER_MIN_TS + Charting PAGES_PER_MIN_TS + Navigate to http://localhost:3000/dashboard/db/click-stream-analysis + + 4. From your terminal, load the dashboard into Grafana. + + .. code:: bash + + $ ./clickstream-analysis-dashboard.sh + + Your output should resemble: + + .. code:: bash + + Loading Grafana ClickStream Dashboard + {"slug":"click-stream-analysis","status":"success","version":5} + +12. Go to your browser and view the Grafana output at + http://localhost:33000/dashboard/db/click-stream-analysis. You can + login with user ID ``admin`` and password ``admin``. + + **Important:** If you already have Grafana UI open, you may need to + enter the specific clickstream URL: + http://localhost:33000/dashboard/db/click-stream-analysis. + + .. figure:: grafana-success.png + :alt: Grafana UI success + + Grafana UI success + +**About:** This dashboard demonstrates a series of streaming +functionality where the title of each panel describes the type of stream +processing required to generate the data. For example, the large chart +in the middle is showing web-resource requests on a per-username basis +using a Session window - where a sessions expire after 300 seconds of +inactivity. Editing the panel allows you to view the datasource - which +is named after the streams and tables captured in the +clickstream-schema.sql file. + +| **Interesting things to try:** \* Understand how the + ``clickstream-schema.sql`` file is structured. We use a + DataGen.KafkaTopic.clickstream -> Stream -> Table (for window & + analytics with group-by) -> Table (to Add EVENT_TS for time-index) -> + ElastiSearch/Connect topic +| \* Run the ``LIST TOPICS;`` command to see where data is persisted \* + Run the KSQL CLI ``history`` command + +Troubleshooting +--------------- + +- Check the Data Sources page in Grafana. + + - If your data source is shown, select it and scroll to the bottom + and click the **Save & Test** button. This will indicate whether + your data source is valid. diff --git a/docs/ksql-clickstream-demo/index.rst b/docs/ksql-clickstream-demo/index.rst new file mode 100644 index 000000000000..5bae71d2239a --- /dev/null +++ b/docs/ksql-clickstream-demo/index.rst @@ -0,0 +1,31 @@ +.. _ksql_clickstream: + +Clickstream Analysis Demo +========================= + +Clickstream analysis is the process of collecting, analyzing, and +reporting aggregate data about which pages a website visitor visits and +in what order. The path the visitor takes though a website is called the +clickstream. + +This demo focuses on building real-time analytics of users to determine: + +* General website analytics, such as hit count and visitors +* Bandwidth use +* Mapping user-IP addresses to actual users and their location +* Detection of high-bandwidth user sessions +* Error-code occurrence and enrichment +* Sessionization to track user-sessions and understand behavior (such as per-user-session-bandwidth, per-user-session-hits etc) + +The demo uses standard streaming functions (i.e., min, max, etc), as +well as enrichment using child tables, table-stream joins and different +types of windowing functionality. + +.. toctree:: + :maxdepth: 1 + :caption: Table of Contents + + non-docker-clickstream + docker-clickstream + + diff --git a/docs/ksql-clickstream-demo/non-docker-clickstream.rst b/docs/ksql-clickstream-demo/non-docker-clickstream.rst new file mode 100644 index 000000000000..53aebe5b0e2a --- /dev/null +++ b/docs/ksql-clickstream-demo/non-docker-clickstream.rst @@ -0,0 +1,364 @@ +.. _ksql_clickstream_non_docker: + +Clickstream Analysis with KSQL running locally +============================================== + +These steps will guide you through how to setup your environment and run +the clickstream analysis demo. + +Prerequisites +------------- + +- :ref:`Confluent 4.0.0 ` locally + + - Make sure you only have one broker running on the host + +- `ElasticSearch `__ +- `Grafana `__ +- `Git `__ +- `Maven `__ +- Java: Minimum version 1.8. Install Oracle Java JRE or JDK >= 1.8 on + your local machine + +1. Clone the Confluent KSQL repository. + + .. code:: bash + + $ git clone git@github.com:confluentinc/ksql.git + +2. Change directory to the ``ksql`` directory and compile the KSQL + code. + + .. code:: bash + + $ cd ksql + $ mvn clean compile install -DskipTests + +3. Copy the Kafka Connect Elasticsearch configuration file + (``ksql/ksql-clickstream-demo/demo/connect-config/null-filter-4.0.0-SNAPSHOT.jar``) + to your Confluent installation ``share`` directory + (``confluent-3.3.0/share/java/kafka-connect-elasticsearch/``). + + .. code:: bash + + cp ksql-clickstream-demo/demo/connect-config/null-filter-4.0.0-SNAPSHOT.jar /share/java/kafka-connect-elasticsearch/ + +4. From your terminal, start the Confluent Platform. It should be + running on default port 8083. + + .. code:: bash + + $ /bin/confluent start + + The output should resemble: + + .. code:: bash + + Starting zookeeper + zookeeper is [UP] + Starting kafka + kafka is [UP] + Starting schema-registry + schema-registry is [UP] + Starting kafka-rest + kafka-rest is [UP] + Starting connect + connect is [UP] + +5. From your terminal, start the Elastic and Grafana servers. + ElasticSearch should be running on the default port 9200. Grafana + should be running on the default port 3000. + + - `Start + Elastic `__ + - `Start Grafana `__ + +6. From your terminal, create the clickStream data using the + ksql-datagen utility. This stream will run continuously until you + terminate. + + **Tip:** Because of shell redirection, this command does not print a + newline and so it might look like it’s still in the foreground. The + process is running as a daemon, so just press return again to see + the shell prompt. + + .. code:: bash + + $ /bin/ksql-datagen -daemon quickstart=clickstream format=json topic=clickstream maxInterval=100 iterations=500000 + + Your output should resemble: + + .. code:: bash + + Writing console output to /tmp/ksql-logs/ksql.out + +7. From your terminal, create the status codes using the ksql-datagen + utility. This stream runs once to populate the table. + + .. code:: bash + + $ /bin/ksql-datagen quickstart=clickstream_codes format=json topic=clickstream_codes maxInterval=20 iterations=100 + + Your output should resemble: + + .. code:: bash + + 200 --> ([ 200 | 'Successful' ]) + 302 --> ([ 302 | 'Redirect' ]) + 200 --> ([ 200 | 'Successful' ]) + 406 --> ([ 406 | 'Not acceptable' ]) + ... + +8. From your terminal, create a set of users using ksql-datagen + utility. This stream runs once to populate the table. + + .. code:: bash + + $ /bin/ksql-datagen quickstart=clickstream_users format=json topic=clickstream_users maxInterval=10 iterations=1000 + + Your output should resemble: + + .. code:: bash + + 1 --> ([ 1 | 'GlenAlan_23344' | 1424796387808 | 'Curran' | 'Lalonde' | 'Palo Alto' | 'Gold' ]) + 2 --> ([ 2 | 'ArlyneW8ter' | 1433932319457 | 'Oriana' | 'Vanyard' | 'London' | 'Platinum' ]) + 3 --> ([ 3 | 'akatz1022' | 1478233258664 | 'Ferd' | 'Trice' | 'Palo Alto' | 'Platinum' ]) + ... + +9. Launch the KSQL CLI in local mode. + + .. code:: bash + + $ /bin/ksql-cli local + + You should see the KSQL CLI welcome screen. + + .. code:: bash + + ====================================== + = _ __ _____ ____ _ = + = | |/ // ____|/ __ \| | = + = | ' /| (___ | | | | | = + = | < \___ \| | | | | = + = | . \ ____) | |__| | |____ = + = |_|\_\_____/ \___\_\______| = + = = + = Streaming SQL Engine for Kafka = + Copyright 2017 Confluent Inc. + + CLI v0.2, Server v0.1 located at http://localhost:9098 + + Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! + + ksql> + +10. From the the KSQL CLI, load the ``clickstream.sql`` schema file that + will run the demo app. + + **Important:** Before running this step, you must have already run + ksql-datagen utility to create the clickstream data, status codes, + and set of users. + + :: + + ksql> RUN SCRIPT 'ksql-clickstream-demo/demo/clickstream-schema.sql'; + + The output should resemble: + + .. code:: bash + + Message + ------------------------------------ + Executing statement + ksql> + +11. From the the KSQL CLI, verify that the tables are created. + + .. code:: bash + + ksql> LIST TABLES; + + Your output should resemble: + + .. code:: bash + + Table Name | Kafka Topic | Format | Windowed + ----------------------------------------------------------------------------- + WEB_USERS | clickstream_users | JSON | false + ERRORS_PER_MIN_ALERT | ERRORS_PER_MIN_ALERT | JSON | true + CLICKSTREAM_CODES_TS | CLICKSTREAM_CODES_TS | JSON | false + USER_IP_ACTIVITY | USER_IP_ACTIVITY | JSON | true + CLICKSTREAM_CODES | clickstream_codes | JSON | false + PAGES_PER_MIN | PAGES_PER_MIN | JSON | true + CLICK_USER_SESSIONS | CLICK_USER_SESSIONS | JSON | true + ENRICHED_ERROR_CODES_COUNT | ENRICHED_ERROR_CODES_COUNT | JSON | true + EVENTS_PER_MIN_MAX_AVG | EVENTS_PER_MIN_MAX_AVG | JSON | true + ERRORS_PER_MIN | ERRORS_PER_MIN | JSON | true + EVENTS_PER_MIN | EVENTS_PER_MIN | JSON | true + +12. From the the KSQL CLI, verify that the streams are created. + + .. code:: bash + + ksql> LIST STREAMS; + + Your output should resemble: + + .. code:: bash + + Stream Name | Kafka Topic | Format + ---------------------------------------------------------------- + USER_CLICKSTREAM | USER_CLICKSTREAM | JSON + EVENTS_PER_MIN_MAX_AVG_TS | EVENTS_PER_MIN_MAX_AVG_TS | JSON + ERRORS_PER_MIN_TS | ERRORS_PER_MIN_TS | JSON + EVENTS_PER_MIN_TS | EVENTS_PER_MIN_TS | JSON + ENRICHED_ERROR_CODES | ENRICHED_ERROR_CODES | JSON + ERRORS_PER_MIN_ALERT_TS | ERRORS_PER_MIN_ALERT_TS | JSON + CLICK_USER_SESSIONS_TS | CLICK_USER_SESSIONS_TS | JSON + PAGES_PER_MIN_TS | PAGES_PER_MIN_TS | JSON + ENRICHED_ERROR_CODES_TS | ENRICHED_ERROR_CODES_TS | JSON + USER_IP_ACTIVITY_TS | USER_IP_ACTIVITY_TS | JSON + CUSTOMER_CLICKSTREAM | CUSTOMER_CLICKSTREAM | JSON + CLICKSTREAM | clickstream | JSON + +13. From the the KSQL CLI, verify that data is being streamed through + various tables and streams. + + **View clickstream data** + + .. code:: bash + + ksql> SELECT * FROM CLICKSTREAM LIMIT 5; + + Your output should resemble: + + .. code:: bash + + 1503585407989 | 222.245.174.248 | 1503585407989 | 24/Aug/2017:07:36:47 -0700 | 233.90.225.227 | GET /site/login.html HTTP/1.1 | 407 | 19 | 4096 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) + 1503585407999 | 233.168.257.122 | 1503585407999 | 24/Aug/2017:07:36:47 -0700 | 233.173.215.103 | GET /site/user_status.html HTTP/1.1 | 200 | 15 | 14096 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) + 1503585408009 | 222.168.57.122 | 1503585408009 | 24/Aug/2017:07:36:48 -0700 | 111.249.79.93 | GET /images/track.png HTTP/1.1 | 406 | 22 | 4096 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) + 1503585408019 | 122.145.8.244 | 1503585408019 | 24/Aug/2017:07:36:48 -0700 | 122.249.79.233 | GET /site/user_status.html HTTP/1.1 | 404 | 6 | 4006 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) + 1503585408029 | 222.152.45.45 | 1503585408029 | 24/Aug/2017:07:36:48 -0700 | 222.249.79.93 | GET /images/track.png HTTP/1.1 | 200 | 29 | 14096 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 + LIMIT reached for the partition. + Query terminated + + **View the events per minute** + + .. code:: bash + + ksql> SELECT * FROM EVENTS_PER_MIN_TS LIMIT 5; + + Your output should resemble: + + .. code:: bash + + 1503585450000 | 29^�8 | 1503585450000 | 29 | 19 + 1503585450000 | 37^�8 | 1503585450000 | 37 | 25 + 1503585450000 | 8^�8 | 1503585450000 | 8 | 35 + 1503585450000 | 36^�8 | 1503585450000 | 36 | 14 + 1503585450000 | 24^�8 | 1503585450000 | 24 | 22 + LIMIT reached for the partition. + Query terminated + + **View pages per minute** + + .. code:: bash + + ksql> SELECT * FROM PAGES_PER_MIN LIMIT 5; + + Your output should resemble: + + .. code:: bash + + 1503585475000 | 4 : Window{start=1503585475000 end=-} | 4 | 14 + 1503585480000 | 25 : Window{start=1503585480000 end=-} | 25 | 9 + 1503585480000 | 16 : Window{start=1503585480000 end=-} | 16 | 6 + 1503585475000 | 25 : Window{start=1503585475000 end=-} | 25 | 20 + 1503585480000 | 37 : Window{start=1503585480000 end=-} | 37 | 6 + LIMIT reached for the partition. + Query terminated + +14. Go to your terminal and send the KSQL tables to Elasticsearch and + Grafana. + + 1. From your terminal, navigate to the demo directory: + + .. code:: bash + + cd ksql-clickstream-demo/demo/ + + 2. Run this command to send the KSQL tables to Elasticsearch and + Grafana: + + .. code:: bash + + $ ./ksql-tables-to-grafana.sh + + Your output should resemble: + + .. code:: bash + + Loading Clickstream-Demo TABLES to Confluent-Connect => Elastic => Grafana datasource + Logging to: /tmp/ksql-connect.log + Charting CLICK_USER_SESSIONS_TS + Charting USER_IP_ACTIVITY_TS + Charting CLICKSTREAM_STATUS_CODES_TS + Charting ENRICHED_ERROR_CODES_TS + Charting ERRORS_PER_MIN_ALERT_TS + Charting ERRORS_PER_MIN_TS + Charting EVENTS_PER_MIN_MAX_AVG_TS + Charting EVENTS_PER_MIN_TS + Charting PAGES_PER_MIN_TS + Navigate to http://localhost:3000/dashboard/db/click-stream-analysis + + **Important:** The ``http://localhost:3000/`` URL is only + available inside the container. We will access the dashboard with + a slightly different URL, after running the next command. + + 3. From your terminal, load the dashboard into Grafana. + + .. code:: bash + + $ ./clickstream-analysis-dashboard.sh + + Your output should resemble: + + .. code:: bash + + Loading Grafana ClickStream Dashboard + {"slug":"click-stream-analysis","status":"success","version":1} + +15. Go to your browser and view the Grafana output at + http://localhost:3000/dashboard/db/click-stream-analysis. You can + login with user ID ``admin`` and password ``admin``. + + **Important:** If you already have Grafana UI open, you may need to + enter the specific clickstream URL: + http://localhost:3000/dashboard/db/click-stream-analysis. + + .. figure:: grafana-success.png + :alt: Grafana UI success + + Grafana UI success + +| Interesting things to try: \* Understand how the + ``clickstream-schema.sql`` file is structured. We use a + DataGen.KafkaTopic.clickstream -> Stream -> Table (for window & + analytics with group-by) -> Table (to Add EVENT_TS for time-index) -> + ElasticSearch/Connect topic +| \* Run the ``LIST TOPICS;`` command to see where data is persisted \* + Run the KSQL CLI ``history`` command + +Troubleshooting +--------------- + +- Docker must not be running on the host machine. +- Check that Elasticsearch is running: http://localhost:9200/. +- Check the Data Sources page in Grafana. + + - If your data source is shown, select it and scroll to the bottom + and click the **Save & Test** button. This will indicate whether + your data source is valid. + - If your data source is not shown, go to + ``/demo/`` and run + ``./ksql-tables-to-grafana.sh``. diff --git a/docs/ksql-clickstream-demo/pom.xml b/docs/ksql-clickstream-demo/pom.xml new file mode 100644 index 000000000000..8a3535a4afb9 --- /dev/null +++ b/docs/ksql-clickstream-demo/pom.xml @@ -0,0 +1,109 @@ + + + + + 4.0.0 + + + io.confluent.ksql + ksql-parent + 0.1-SNAPSHOT + + + io.confluent.ksql + ksql-clickstream-demo + KSQL Clickstream Analysis Demo + + + + io.confluent.ksql + ksql-examples + ${project.version} + + + + io.confluent.ksql + ksql-cli + ${project.version} + + + + + + + org.apache.maven.plugins + maven-assembly-plugin + + + src/assembly/development.xml + src/assembly/package.xml + src/assembly/standalone.xml + + + + ${main-class} + + + false + + + + make-assembly + package + + single + + + + + + + + + + packaging + + + + com.spotify + dockerfile-maven-plugin + ${dockerfile-maven-plugin.version} + + + default + + build + + + + ${project.artifactId} + ${project.version} + ${docker.registry} + + ${docker.tag} + ${docker.registry}confluentinc/${project.artifactId} + + + + + + + + + diff --git a/docs/make.bat b/docs/make.bat new file mode 100644 index 000000000000..54966cf1bdb8 --- /dev/null +++ b/docs/make.bat @@ -0,0 +1,242 @@ +@ECHO OFF + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set BUILDDIR=_build +set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% . +set I18NSPHINXOPTS=%SPHINXOPTS% . +if NOT "%PAPER%" == "" ( + set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% + set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS% +) + +if "%1" == "" goto help + +if "%1" == "help" ( + :help + echo.Please use `make ^` where ^ is one of + echo. html to make standalone HTML files + echo. dirhtml to make HTML files named index.html in directories + echo. singlehtml to make a single large HTML file + echo. pickle to make pickle files + echo. json to make JSON files + echo. htmlhelp to make HTML files and a HTML help project + echo. qthelp to make HTML files and a qthelp project + echo. devhelp to make HTML files and a Devhelp project + echo. epub to make an epub + echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter + echo. text to make text files + echo. man to make manual pages + echo. texinfo to make Texinfo files + echo. gettext to make PO message catalogs + echo. changes to make an overview over all changed/added/deprecated items + echo. xml to make Docutils-native XML files + echo. pseudoxml to make pseudoxml-XML files for display purposes + echo. linkcheck to check all external links for integrity + echo. doctest to run all doctests embedded in the documentation if enabled + goto end +) + +if "%1" == "clean" ( + for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i + del /q /s %BUILDDIR%\* + goto end +) + + +%SPHINXBUILD% 2> nul +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.http://sphinx-doc.org/ + exit /b 1 +) + +if "%1" == "html" ( + %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The HTML pages are in %BUILDDIR%/html. + goto end +) + +if "%1" == "dirhtml" ( + %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. + goto end +) + +if "%1" == "singlehtml" ( + %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. + goto end +) + +if "%1" == "pickle" ( + %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle + if errorlevel 1 exit /b 1 + echo. + echo.Build finished; now you can process the pickle files. + goto end +) + +if "%1" == "json" ( + %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json + if errorlevel 1 exit /b 1 + echo. + echo.Build finished; now you can process the JSON files. + goto end +) + +if "%1" == "htmlhelp" ( + %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp + if errorlevel 1 exit /b 1 + echo. + echo.Build finished; now you can run HTML Help Workshop with the ^ +.hhp project file in %BUILDDIR%/htmlhelp. + goto end +) + +if "%1" == "qthelp" ( + %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp + if errorlevel 1 exit /b 1 + echo. + echo.Build finished; now you can run "qcollectiongenerator" with the ^ +.qhcp project file in %BUILDDIR%/qthelp, like this: + echo.^> qcollectiongenerator %BUILDDIR%\qthelp\KafkaRESTProxy.qhcp + echo.To view the help file: + echo.^> assistant -collectionFile %BUILDDIR%\qthelp\KafkaRESTProxy.ghc + goto end +) + +if "%1" == "devhelp" ( + %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. + goto end +) + +if "%1" == "epub" ( + %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The epub file is in %BUILDDIR%/epub. + goto end +) + +if "%1" == "latex" ( + %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex + if errorlevel 1 exit /b 1 + echo. + echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. + goto end +) + +if "%1" == "latexpdf" ( + %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex + cd %BUILDDIR%/latex + make all-pdf + cd %BUILDDIR%/.. + echo. + echo.Build finished; the PDF files are in %BUILDDIR%/latex. + goto end +) + +if "%1" == "latexpdfja" ( + %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex + cd %BUILDDIR%/latex + make all-pdf-ja + cd %BUILDDIR%/.. + echo. + echo.Build finished; the PDF files are in %BUILDDIR%/latex. + goto end +) + +if "%1" == "text" ( + %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The text files are in %BUILDDIR%/text. + goto end +) + +if "%1" == "man" ( + %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The manual pages are in %BUILDDIR%/man. + goto end +) + +if "%1" == "texinfo" ( + %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo. + goto end +) + +if "%1" == "gettext" ( + %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The message catalogs are in %BUILDDIR%/locale. + goto end +) + +if "%1" == "changes" ( + %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes + if errorlevel 1 exit /b 1 + echo. + echo.The overview file is in %BUILDDIR%/changes. + goto end +) + +if "%1" == "linkcheck" ( + %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck + if errorlevel 1 exit /b 1 + echo. + echo.Link check complete; look for any errors in the above output ^ +or in %BUILDDIR%/linkcheck/output.txt. + goto end +) + +if "%1" == "doctest" ( + %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest + if errorlevel 1 exit /b 1 + echo. + echo.Testing of doctests in the sources finished, look at the ^ +results in %BUILDDIR%/doctest/output.txt. + goto end +) + +if "%1" == "xml" ( + %SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The XML files are in %BUILDDIR%/xml. + goto end +) + +if "%1" == "pseudoxml" ( + %SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml. + goto end +) + +:end diff --git a/docs/quickstart/index.rst b/docs/quickstart/index.rst new file mode 100644 index 000000000000..caaab2367029 --- /dev/null +++ b/docs/quickstart/index.rst @@ -0,0 +1,263 @@ +.. _ksql_quickstart: + +Quick Start +=========== + +.. important:: + + This release is a developer preview. It is strongly recommended that you test before running KSQL against a production Kafka cluster. + +The goal of this quick start is to demonstrate a simple workflow using +KSQL to write streaming queries against messages in Kafka. + +.. contents:: + :depth: 2 + +Setup +----- + +.. toctree:: + :hidden: + + quickstart-non-docker + quickstart-docker + +To get started, you must start a Kafka cluster, including ZooKeeper and a Kafka broker. KSQL will then query messages from this Kafka cluster. + +Launch a Kafka cluster, start KSQL, and produce messages to query. Follow the instructions based on whether you are using Docker: + +:ref:`ksql_quickstart_docker` (recommended) + The Docker container starts a Kafka cluster, starts KSQL, and automatically runs a data generator that continuously produces Kafka messages to the Kafka cluster. No additional setup is required. + +:ref:`ksql_quickstart_non_docker` + With this method you start a Kafka cluster, start KSQL, and manually run a data generator to produce topics called ``pageviews`` and ``users`` + +.. _create-a-stream-and-table: + +Create a Stream and Table +------------------------- + +These examples query messages from Kafka topics called ``pageviews`` and ``users`` using the following schemas: + +.. image:: ../img/ksql-quickstart-schemas.jpg + +#. Create a STREAM ``pageviews_original`` from the Kafka topic + ``pageviews``, specifying the ``value_format`` of ``DELIMITED``. + Describe the new STREAM. Notice that KSQL created additional columns + called ``ROWTIME``, which corresponds to the Kafka message timestamp, + and ``ROWKEY``, which corresponds to the Kafka message key. + + .. code:: bash + + ksql> CREATE STREAM pageviews_original (viewtime BIGINT, userid VARCHAR, pageid VARCHAR) WITH (kafka_topic='pageviews', value_format='DELIMITED'); + + ksql> DESCRIBE pageviews_original; + + Field | Type + ---------------------------- + ROWTIME | BIGINT + ROWKEY | VARCHAR(STRING) + VIEWTIME | BIGINT + USERID | VARCHAR(STRING) + PAGEID | VARCHAR(STRING) + +#. Create a TABLE ``users_original`` from the Kafka topic ``users``, + specifying the ``value_format`` of ``JSON``. Describe the new TABLE. + + .. code:: bash + + ksql> CREATE TABLE users_original (registertime BIGINT, gender VARCHAR, regionid VARCHAR, userid VARCHAR) WITH (kafka_topic='users', value_format='JSON'); + + ksql> DESCRIBE users_original; + + Field | Type + -------------------------------- + ROWTIME | BIGINT + ROWKEY | VARCHAR(STRING) + REGISTERTIME | BIGINT + GENDER | VARCHAR(STRING) + REGIONID | VARCHAR(STRING) + USERID | VARCHAR(STRING) + +#. Show all STREAMS and TABLES. + + .. code:: bash + + ksql> SHOW STREAMS; + + Stream Name | Kafka Topic | Format + ----------------------------------------------------------------- + PAGEVIEWS_ORIGINAL | pageviews | DELIMITED + + ksql> SHOW TABLES; + + Table Name | Kafka Topic | Format | Windowed + -------------------------------------------------------------- + USERS_ORIGINAL | users | JSON | false + +Write Queries +------------- + +These examples write queries using KSQL. + +**Note:** By default KSQL reads the topics for streams and tables from +the latest offset. + +#. Use ``SELECT`` to create a query that returns data from a STREAM. To + stop viewing the data, press ````. You may optionally include + the ``LIMIT`` keyword to limit the number of rows returned in the + query result. Note that exact data output may vary because of the + randomness of the data generation. + + .. code:: bash + + ksql> SELECT pageid FROM pageviews_original LIMIT 3; + Page_24 + Page_73 + Page_78 + LIMIT reached for the partition. + Query terminated + ksql> + +#. Create a persistent query by using the ``CREATE STREAM`` keywords to + precede the ``SELECT`` statement. Unlike the non-persistent query + above, results from this query are written to a Kafka topic + ``PAGEVIEWS_FEMALE``. The query below enriches the ``pageviews`` + STREAM by doing a ``LEFT JOIN`` with the ``users_original`` TABLE on + the user ID, where a condition is met. + + .. code:: bash + + ksql> CREATE STREAM pageviews_female AS SELECT users_original.userid AS userid, pageid, regionid, gender FROM pageviews_original LEFT JOIN users_original ON pageviews_original.userid = users_original.userid WHERE gender = 'FEMALE'; + + ksql> DESCRIBE pageviews_female; + Field | Type + ---------------------------- + ROWTIME | BIGINT + ROWKEY | VARCHAR(STRING) + USERID | VARCHAR(STRING) + PAGEID | VARCHAR(STRING) + REGIONID | VARCHAR(STRING) + GENDER | VARCHAR(STRING) + +#. Use ``SELECT`` to view query results as they come in. To stop viewing + the query results, press ````. This stops printing to the + console but it does not terminate the actual query. The query + continues to run in the underlying KSQL application. + + .. code:: bash + + ksql> SELECT * FROM pageviews_female; + 1502477856762 | User_2 | User_2 | Page_55 | Region_9 | FEMALE + 1502477857946 | User_5 | User_5 | Page_14 | Region_2 | FEMALE + 1502477858436 | User_3 | User_3 | Page_60 | Region_3 | FEMALE + ^CQuery terminated + ksql> + +#. Create a new persistent query where another condition is met, using + ``LIKE``. Results from this query are written to a Kafka topic called + ``pageviews_enriched_r8_r9``. + + .. code:: bash + + ksql> CREATE STREAM pageviews_female_like_89 WITH (kafka_topic='pageviews_enriched_r8_r9', value_format='DELIMITED') AS SELECT * FROM pageviews_female WHERE regionid LIKE '%_8' OR regionid LIKE '%_9'; + +#. Create a new persistent query that counts the pageviews for each + region and gender combination in a `tumbling + window `__ + of 30 seconds when the count is greater than 1. Results from this + query are written to a Kafka topic called ``PAGEVIEWS_REGIONS``. + + .. code:: bash + + ksql> CREATE TABLE pageviews_regions AS SELECT gender, regionid , COUNT(*) AS numusers FROM pageviews_female WINDOW TUMBLING (size 30 second) GROUP BY gender, regionid HAVING COUNT(*) > 1; + + ksql> DESCRIBE pageviews_regions; + + Field | Type + ---------------------------- + ROWTIME | BIGINT + ROWKEY | VARCHAR(STRING) + GENDER | VARCHAR(STRING) + REGIONID | VARCHAR(STRING) + NUMUSERS | BIGINT + +#. Use ``SELECT`` to view results from the above query. + + .. code:: bash + + ksql> SELECT regionid, numusers FROM pageviews_regions LIMIT 5; + Region_3 | 4 + Region_3 | 5 + Region_6 | 5 + Region_6 | 6 + Region_3 | 8 + LIMIT reached for the partition. + Query terminated + ksql> + +#. Show all persistent queries. + + .. code:: bash + + ksql> SHOW QUERIES; + + Query ID | Kafka Topic | Query String + ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + 1 | PAGEVIEWS_FEMALE | CREATE STREAM pageviews_female AS SELECT users_original.userid AS userid, pageid, regionid, gender FROM pageviews_original LEFT JOIN users_original ON pageviews_original.userid = users_original.userid WHERE gender = 'FEMALE'; + 2 | pageviews_enriched_r8_r9 | CREATE STREAM pageviews_female_like_89 WITH (kafka_topic='pageviews_enriched_r8_r9', value_format='DELIMITED') AS SELECT * FROM pageviews_female WHERE regionid LIKE '%_8' OR regionid LIKE '%_9'; + 3 | PAGEVIEWS_REGIONS | CREATE TABLE pageviews_regions AS SELECT gender, regionid , COUNT(*) AS numusers FROM pageviews_female WINDOW TUMBLING (size 30 second) GROUP BY gender, regionid HAVING COUNT(*) > 1; + +Terminate and Exit +------------------ + +KSQL +~~~~ + +**Important:** Queries will continuously run as KSQL applications until +they are manually terminated. Exiting KSQL does not terminate persistent +queries. + +#. From the output of ``SHOW QUERIES;`` identify a query ID you would + like to terminate. For example, if you wish to terminate query ID + ``2``: + + .. code:: bash + + ksql> TERMINATE 2; + +#. To exit from KSQL, type ‘exit’. + + .. code:: bash + + ksql> exit + +Docker +~~~~~~ + +If you are running Docker Compose, you must explicitly shut down Docker +Compose. For more information, see the `docker-compose +down `__ documentation. + +**Important:** This command will delete all KSQL queries and topic data. + +.. code:: bash + + $ docker-compose down + +Confluent Platform +~~~~~~~~~~~~~~~~~~ + +If you are running the Confluent Platform, you can stop it with this +command. + +.. code:: bash + + $ confluent stop + +Next steps +---------- + +Try the end-to-end :ref:`Clickstream Analysis +demo `, which shows how +to build an application that performs real-time user analytics. diff --git a/docs/quickstart/quickstart-docker.rst b/docs/quickstart/quickstart-docker.rst new file mode 100644 index 000000000000..47af02fdfdd2 --- /dev/null +++ b/docs/quickstart/quickstart-docker.rst @@ -0,0 +1,200 @@ +.. _ksql_quickstart_docker: + +Running KSQL Using Docker +========================= + +This topic describes how to setup a Kafka cluster and start KSQL in a Docker container. After you complete these steps, +you can return to the :ref:`create-a-stream-and-table` and start querying the data in the Kafka cluster. + +.. contents:: Contents + :local: + :depth: 1 + +**Prerequisites** + +- Docker + - `macOS `__ + - `All platforms `__ +- `Git `__ +- Java: Minimum version 1.8 + +Start a Kafka cluster +--------------------- + +1. Clone the Confluent KSQL repository. + + .. code:: bash + + $ git clone git@github.com:confluentinc/ksql.git + +2. Change directory to the quickstart and launch the KSQL quick start in + Docker. + + .. code:: bash + + $ cd ksql/docs/quickstart/ + $ docker-compose up -d + + After you have successfully started the Kafka cluster and started + KSQL, you will see the KSQL prompt: + + .. code:: bash + + ====================================== + = _ __ _____ ____ _ = + = | |/ // ____|/ __ \| | = + = | ' /| (___ | | | | | = + = | < \___ \| | | | | = + = | . \ ____) | |__| | |____ = + = |_|\_\_____/ \___\_\______| = + = = + = Streaming SQL Engine for Kafka = + Copyright 2017 Confluent Inc. + + CLI v0.2, Server v0.1 located at http://localhost:9098 + + Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! + + ksql> + +Proceed to :ref:`start-ksql`. + +.. _start-ksql: + +Start KSQL +---------- + +1. From the host machine, start KSQL CLI on the container. + + .. code:: bash + + $ docker-compose exec ksql-cli ksql-cli local --bootstrap-server kafka:29092 + +2. Return to the :ref:`create-a-stream-and-table` to start querying the + data in the Kafka cluster. + +Appendix +-------- + +The following instructions in the Appendix are not required to run the +quick start. They are optional steps to produce extra topic data and +verify the environment. + +Produce more topic data +~~~~~~~~~~~~~~~~~~~~~~~ + +The Compose file automatically runs a data generator that continuously +produces data to two Kafka topics ``pageviews`` and ``users``. No +further action is required if you want to use just the data available. +You can return to the :ref:`main KSQL quick +start ` to start querying the +data in these two topics. + +However, if you want to produce additional data, you can use any of the +following methods. + +- Produce Kafka data with the Kafka command line + ``kafka-console-producer``. The following example generates data with + a value in DELIMITED format. + + .. code:: bash + + $ docker-compose exec kafka kafka-console-producer --topic t1 --broker-list kafka:29092 --property parse.key=true --property key.separator=: + + Your data input should resemble this. + + .. code:: bash + + key1:v1,v2,v3 + key2:v4,v5,v6 + key3:v7,v8,v9 + key1:v10,v11,v12 + +- Produce Kafka data with the Kafka command line + ``kafka-console-producer``. The following example generates data with + a value in JSON format. + + .. code:: bash + + $ docker-compose exec kafka kafka-console-producer --topic t2 --broker-list kafka:29092 --property parse.key=true --property key.separator=: + + Your data input should resemble this. + + .. code:: bash + + key1:{"id":"key1","col1":"v1","col2":"v2","col3":"v3"} + key2:{"id":"key2","col1":"v4","col2":"v5","col3":"v6"} + key3:{"id":"key3","col1":"v7","col2":"v8","col3":"v9"} + key1:{"id":"key1","col1":"v10","col2":"v11","col3":"v12"} + +Verify your environment +~~~~~~~~~~~~~~~~~~~~~~~ + +The next three steps are optional verification steps to ensure your +environment is properly setup. + +1. Verify that six Docker containers were created. + + .. code:: bash + + $ docker-compose ps + + Your output should resemble this. Take note of the ``Up`` state. + + .. code:: bash + + Name Command State Ports + ------------------------------------------------------------------------------------------------------------------------- + quickstart_kafka_1 /etc/confluent/docker/run Up 0.0.0.0:29092->29092/tcp, 0.0.0.0:9092->9092/tcp + quickstart_ksql-cli_1 perl -e while(1){ sleep 99 ... Up + quickstart_ksql-datagen-pageviews_1 bash -c echo Waiting for K ... Up + quickstart_ksql-datagen-users_1 bash -c echo Waiting for K ... Up + quickstart_schema-registry_1 /etc/confluent/docker/run Up 0.0.0.0:8081->8081/tcp + quickstart_zookeeper_1 /etc/confluent/docker/run Up 2181/tcp, 2888/tcp, 0.0.0.0:32181->32181/tcp, 3888/tcp + +2. The docker-compose file already runs a data generator that + pre-populates two Kafka topics ``pageviews`` and ``users`` with mock + data. Verify that the data generator created two Kafka topics, + including ``pageviews`` and ``users``. + + .. code:: bash + + $ docker-compose exec kafka kafka-topics --zookeeper zookeeper:32181 --list + + Your output should resemble this. + + .. code:: bash + + _confluent-metrics + _schemas + pageviews + users + +3. Use the ``kafka-console-consumer`` to view a few messages from each + topic. The topic ``pageviews`` has a key that is a mock time stamp + and a value that is in ``DELIMITED`` format. The topic ``users`` has + a key that is the user ID and a value that is in ``Json`` format. + + .. code:: bash + + $ docker-compose exec kafka kafka-console-consumer --topic pageviews --bootstrap-server kafka:29092 --from-beginning --max-messages 3 --property print.key=true + + Your output should resemble this. + + .. code:: bash + + 1491040409254 1491040409254,User_5,Page_70 + 1488611895904 1488611895904,User_8,Page_76 + 1504052725192 1504052725192,User_8,Page_92 + + .. code:: bash + + $ docker-compose exec kafka kafka-console-consumer --topic users --bootstrap-server kafka:29092 --from-beginning --max-messages 3 --property print.key=true + + Your output should resemble this. + + .. code:: bash + + User_2 {"registertime":1509789307038,"gender":"FEMALE","regionid":"Region_1","userid":"User_2"} + User_6 {"registertime":1498248577697,"gender":"OTHER","regionid":"Region_8","userid":"User_6"} + User_8 {"registertime":1494834474504,"gender":"MALE","regionid":"Region_5","userid":"User_8"} diff --git a/docs/quickstart/quickstart-non-docker.rst b/docs/quickstart/quickstart-non-docker.rst new file mode 100644 index 000000000000..da72f4398caf --- /dev/null +++ b/docs/quickstart/quickstart-non-docker.rst @@ -0,0 +1,152 @@ +.. _ksql_quickstart_non_docker: + +Running KSQL Locally +==================== + +This topic describes how to setup a Kafka cluster and start KSQL in your local environment. After you complete these steps, +you can return to the :ref:`create-a-stream-and-table` and start querying the data in the Kafka cluster. + +.. contents:: Contents + :local: + :depth: 1 + +Prerequisites + - :ref:`Confluent Platform 4.0.0 ` or later is installed. This installation includes a Kafka broker, ZooKeeper, Schema Registry, REST Proxy, and Kafka Connect. + - If you installed Confluent Platform via TAR or ZIP, navigate into the installation directory. The paths and commands used throughout this quick start assume that your are in this installation directory. + - `Maven `__ + - `Git `__ + - Java: Minimum version 1.8. Install Oracle Java JRE or JDK >= 1.8 on your local machine + +Start Kafka +----------- + +Navigate to the ``confluent-3.3.0`` directory and start the Confluent +Platform using the new Confluent CLI (part of the free Confluent Open +Source distribution). ZooKeeper is listening on ``localhost:2181``, +Kafka broker is listening on ``localhost:9092``, and Confluent Schema +Registry is listening on ``localhost:8081``. + +.. code:: bash + + $ /bin/confluent start + +Your output should resemble this. + +.. code:: bash + + Starting zookeeper + zookeeper is [UP] + Starting kafka + kafka is [UP] + Starting schema-registry + schema-registry is [UP] + Starting kafka-rest + kafka-rest is [UP] + Starting connect + connect is [UP] + +Start KSQL +---------- + +1. Clone the Confluent KSQL repository. + + .. code:: bash + + $ git clone git@github.com:confluentinc/ksql.git + +2. Change directory to the ``ksql`` directory and compile the code. + + .. code:: bash + + $ cd ksql + $ mvn clean compile install -DskipTests + +3. Start KSQL. The ``local`` argument starts KSQL in :ref:`standalone + mode `. + + .. code:: bash + + $ ./bin/ksql-cli local + + After you have successfully started the Kafka cluster and started + KSQL, you will see the KSQL prompt: + + .. code:: bash + + ====================================== + = _ __ _____ ____ _ = + = | |/ // ____|/ __ \| | = + = | ' /| (___ | | | | | = + = | < \___ \| | | | | = + = | . \ ____) | |__| | |____ = + = |_|\_\_____/ \___\_\______| = + = = + = Streaming SQL Engine for Kafka = + Copyright 2017 Confluent Inc. + + CLI v0.2, Server v0.1 located at http://localhost:9098 + + Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! + + ksql> + +See the steps below to generate data to the Kafka cluster. + +.. _produce-topic-data: + +Produce topic data +------------------ + +Minimally, to use the :ref:`ksql_quickstart`, you must run the following +steps to produce data to the Kafka topics ``pageviews`` and ``users``. + +1. Produce Kafka data to the ``pageviews`` topic using the data + generator. The following example continuously generates data with a + value in DELIMITED format. + + .. code:: bash + + $ java -jar ksql-examples/target/ksql-examples-0.1-SNAPSHOT-standalone.jar \ + quickstart=pageviews format=delimited topic=pageviews maxInterval=10000 + +2. Produce Kafka data to the ``users`` topic using the data generator. + The following example continuously generates data with a value in + JSON format. + + .. code:: bash + + $ java -jar ksql-examples/target/ksql-examples-0.1-SNAPSHOT-standalone.jar \ + quickstart=users format=json topic=users maxInterval=10000 + +Optionally, you can return to the :ref:`main KSQL quick start +page ` to start querying the Kafka +cluster. Or you can do additional testing with topic data produced from +the command line tools. + +1. You can produce Kafka data with the Kafka command line + ``kafka-console-producer``. The following example generates data with + a value in DELIMITED format. + + .. code:: bash + + $ kafka-console-producer --broker-list localhost:9092 \ + --topic t1 \ + --property parse.key=true \ + --property key.separator=: + key1:v1,v2,v3 + key2:v4,v5,v6 + key3:v7,v8,v9 + key1:v10,v11,v12 + +2. This example generates data with a value in JSON format. + + .. code:: bash + + $ kafka-console-producer --broker-list localhost:9092 \ + --topic t2 \ + --property parse.key=true \ + --property key.separator=: + key1:{"id":"key1","col1":"v1","col2":"v2","col3":"v3"} + key2:{"id":"key2","col1":"v4","col2":"v5","col3":"v6"} + key3:{"id":"key3","col1":"v7","col2":"v8","col3":"v9"} + key1:{"id":"key1","col1":"v10","col2":"v11","col3":"v12"} diff --git a/docs/syntax-reference.rst b/docs/syntax-reference.rst new file mode 100644 index 000000000000..3b6a119922bb --- /dev/null +++ b/docs/syntax-reference.rst @@ -0,0 +1,753 @@ +.. _ksql_syntax_reference: + +Syntax Reference +================ + +The KSQL CLI provides a terminal-based interactive shell for running +queries. + +.. contents:: Contents + :local: + :depth: 1 + +===================== +CLI-specific commands +===================== + +Unlike KSQL statements such as ``SELECT``, these commands are for +setting a KSQL configuration, exiting the CLI, etc. Run the CLI with +``--help`` to see the available options. + +**Tip:** You can search and browse your command history in the KSQL CLI +with ``Ctrl-R``. After pressing ``Ctrl-R``, start typing the command or +any part of the command to show an auto-complete of past commands. + +.. code:: bash + + Description: + The KSQL CLI provides a terminal-based interactive shell for running queries. Each command must be on a separate + line. For KSQL command syntax, see the documentation at https://github.com/confluentinc/ksql/docs/. + + help: + Show this message. + + clear: + Clear the current terminal. + + output: + View the current output format. + + output : + Set the output format to (valid formats: 'JSON', 'TABULAR') + For example: "output JSON" + + history: + Show previous lines entered during the current CLI session. You can use up and down arrow keys to navigate to the + previous lines too. + + version: + Get the current KSQL version. + + exit: + Exit the CLI. + + + Default behavior: + + Lines are read one at a time and are sent to the server as KSQL unless one of the following is true: + + 1. The line is empty or entirely whitespace. In this case, no request is made to the server. + + 2. The line ends with backslash (`\`). In this case, lines are continuously read and stripped of their trailing + newline and `\` until one is encountered that does not end with `\`; then, the concatenation of all lines read + during this time is sent to the server as KSQL. + +=============== +KSQL statements +=============== + +.. tip:: + + - KSQL statements must be terminated with a semicolon (``;``). + - Multi-line statements: + + - In the CLI you must use a backslash (``\``) to indicate + continuation of a statement on the next line. + - Do not use ``\`` for multi-line statements in ``.sql`` files. + + +.. contents:: Available KSQL statements: + :local: + :depth: 1 + +CREATE STREAM +------------- + +**Synopsis** + +.. code:: sql + + CREATE STREAM stream_name ( { column_name data_type } [, ...] ) + WITH ( property_name = expression [, ...] ); + +**Description** + +Create a new stream with the specified columns and properties. + +The supported column data types are: + +- ``BOOLEAN`` +- ``INTEGER`` +- ``BIGINT`` +- ``DOUBLE`` +- ``VARCHAR`` (or ``STRING``) +- ``ARRAY`` (JSON only) +- ``MAP`` (JSON only) + +KSQL adds the implicit columns ``ROWTIME`` and ``ROWKEY`` to every +stream and table, which represent the corresponding Kafka message +timestamp and message key, respectively. + +The WITH clause supports the following properties: + ++--------------+-------------------------------------------------------+ +| Property | Description | ++==============+=======================================================+ +| KAFKA_TOPIC | The name of the Kafka topic that backs this stream. | +| (required) | The topic must already exist in Kafka. | ++--------------+-------------------------------------------------------+ +| VALUE_FORMAT | Specifies the serialization format of the message | +| (required) | value in the topic. Supported formats: ``JSON``, | +| | ``DELIMITED`` | ++--------------+-------------------------------------------------------+ +| KEY | Associates the message key in the Kafka topic with a | +| | column in the KSQL stream. | ++--------------+-------------------------------------------------------+ +| TIMESTAMP | Associates the message timestamp in the Kafka topic | +| | with a column in the KSQL stream. Time-based | +| | operations such as windowing will process a record | +| | according to this timestamp. | ++--------------+-------------------------------------------------------+ + +Example: + +.. code:: sql + + CREATE STREAM pageviews (viewtime BIGINT, user_id VARCHAR, page_id VARCHAR) + WITH (VALUE_FORMAT = 'JSON', + KAFKA_TOPIC = 'my-pageviews-topic'); + +CREATE TABLE +------------ + +**Synopsis** + +.. code:: sql + + CREATE TABLE table_name ( { column_name data_type } [, ...] ) + WITH ( property_name = expression [, ...] ); + +**Description** + +Create a new table with the specified columns and properties. + +The supported column data types are: + +- ``BOOLEAN`` +- ``INTEGER`` +- ``BIGINT`` +- ``DOUBLE`` +- ``VARCHAR`` (or ``STRING``) +- ``ARRAY`` (JSON only) +- ``MAP`` (JSON only) + +KSQL adds the implicit columns ``ROWTIME`` and ``ROWKEY`` to every +stream and table, which represent the corresponding Kafka message +timestamp and message key, respectively. + +The WITH clause supports the following properties: + ++--------------+-------------------------------------------------------+ +| Property | Description | ++==============+=======================================================+ +| KAFKA_TOPIC | The name of the Kafka topic that backs this table. | +| (required) | The topic must already exist in Kafka. | ++--------------+-------------------------------------------------------+ +| VALUE_FORMAT | Specifies the serialization format of the message | +| (required) | value in the topic. Supported formats: ``JSON``, | +| | ``DELIMITED`` | ++--------------+-------------------------------------------------------+ +| KEY | Associates the message key in the Kafka topic with a | +| | column in the KSQL table. | ++--------------+-------------------------------------------------------+ +| TIMESTAMP | Associates the message timestamp in the Kafka topic | +| | with a column in the KSQL table. Time-based | +| | operations such as windowing will process a record | +| | according to this timestamp. | ++--------------+-------------------------------------------------------+ + +Example: + +.. code:: sql + + CREATE TABLE users (usertimestamp BIGINT, user_id VARCHAR, gender VARCHAR, region_id VARCHAR) + WITH (VALUE_FORMAT = 'JSON', + KAFKA_TOPIC = 'my-users-topic'); + +CREATE STREAM AS SELECT +----------------------- + +**Synopsis** + +.. code:: sql + + CREATE STREAM stream_name + [WITH ( property_name = expression [, ...] )] + AS SELECT select_expr [, ...] + FROM from_item [, ...] + [ WHERE condition ] + [PARTITION BY column_name]; + +**Description** + +Create a new stream along with the corresponding Kafka topic, and +continuously write the result of the SELECT query into the stream and +its corresponding topic. + +If the PARTITION BY clause is present, then the resulting stream will +have the specified column as its key. + +The WITH clause supports the following properties: + ++--------------+-------------------------------------------------------+ +| Property | Description | ++==============+=======================================================+ +| KAFKA_TOPIC | The name of the Kafka topic that backs this stream. | +| | If this property is not set, then the name of the | +| | stream will be used as default. | ++--------------+-------------------------------------------------------+ +| VALUE_FORMAT | Specifies the serialization format of the message | +| | value in the topic. Supported formats: ``JSON``, | +| | ``DELIMITED``. If this property is not set, then the | +| | format of the input stream/table will be used. | ++--------------+-------------------------------------------------------+ +| PARTITIONS | The number of partitions in the topic. If this | +| | property is not set, then the number of partitions of | +| | the input stream/table will be used. | ++--------------+-------------------------------------------------------+ +| REPLICATIONS | The replication factor for the topic. If this | +| | property is not set, then the number of replicas of | +| | the input stream/table will be used. | ++--------------+-------------------------------------------------------+ +| TIMESTAMP | Associates the message timestamp in the Kafka topic | +| | with a column in the KSQL stream. Time-based | +| | operations such as windowing will process a record | +| | according to this timestamp. | ++--------------+-------------------------------------------------------+ + +Note: The ``KEY`` property is not supported – use PARTITION BY instead. + +CREATE TABLE AS SELECT +---------------------- + +**Synopsis** + +.. code:: sql + + CREATE TABLE stream_name + [WITH ( property_name = expression [, ...] )] + AS SELECT select_expr [, ...] + FROM from_item [, ...] + [ WINDOW window_expression ] + [ WHERE condition ] + [ GROUP BY grouping_expression ] + [ HAVING having_expression ]; + +**Description** + +Create a new KSQL table along with the corresponding Kafka topic and +stream the result of the SELECT query as a changelog into the topic. + +The WITH clause supports the following properties: + ++--------------+-------------------------------------------------------+ +| Property | Description | ++==============+=======================================================+ +| KAFKA_TOPIC | The name of the Kafka topic that backs this table. If | +| | this property is not set, then the name of the table | +| | will be used as default. | ++--------------+-------------------------------------------------------+ +| VALUE_FORMAT | Specifies the serialization format of the message | +| | value in the topic. Supported formats: ``JSON``, | +| | ``DELIMITED``. If this property is not set, then the | +| | format of the input stream/table will be used. | ++--------------+-------------------------------------------------------+ +| PARTITIONS | The number of partitions in the topic. If this | +| | property is not set, then the number of partitions of | +| | the input stream/table will be used. | ++--------------+-------------------------------------------------------+ +| REPLICATIONS | The replication factor for the topic. If this | +| | property is not set, then the number of replicas of | +| | the input stream/table will be used. | ++--------------+-------------------------------------------------------+ +| TIMESTAMP | Associates the message timestamp in the Kafka topic | +| | with a column in the KSQL table. Time-based | +| | operations such as windowing will process a record | +| | according to this timestamp. | ++--------------+-------------------------------------------------------+ + + +DESCRIBE +-------- + +**Synopsis** + +.. code:: sql + + DESCRIBE [EXTENDED] (stream_name|table_name); + +**Description** + +List the columns in a stream or table along with their data type and +other attributes. + +* DESCRIBE: List the columns in a stream or table along with their data type and other attributes. +* DESCRIBE EXTENDED: Display DESCRIBE information with additional runtime statistics, Kafka topic details, and the + set of queries that populate the table or stream. + +Example of describing a table: + +.. code:: bash + + ksql> DESCRIBE ip_sum; + + Field | Type + ------------------------------------- + ROWTIME | BIGINT (system) + ROWKEY | VARCHAR(STRING) (system) + IP | VARCHAR(STRING) (key) + KBYTES | BIGINT + ------------------------------------- + For runtime statistics and query details run: DESCRIBE EXTENDED + +Example of describing a table with extended information: + +.. code:: bash + + ksql> DESCRIBE EXTENDED ip_sum; + Type : TABLE + Key field : CLICKSTREAM.IP + Timestamp field : Not set - using + Key format : STRING + Value format : JSON + Kafka output topic : IP_SUM (partitions: 4, replication: 1) + + Field | Type + ------------------------------------- + ROWTIME | BIGINT (system) + ROWKEY | VARCHAR(STRING) (system) + IP | VARCHAR(STRING) (key) + KBYTES | BIGINT + ------------------------------------- + + Queries that write into this TABLE + ----------------------------------- + id:CTAS_IP_SUM - CREATE TABLE IP_SUM as SELECT ip, sum(bytes)/1024 as kbytes FROM CLICKSTREAM window SESSION (300 second) GROUP BY ip; + + For query topology and execution plan please run: EXPLAIN ; for more information + + Local runtime statistics + ------------------------ + messages-per-sec: 4.41 total-messages: 486 last-message: 12/14/17 4:32:23 PM GMT + failed-messages: 0 last-failed: n/a + (Statistics of the local Ksql Server interaction with the Kafka topic IP_SUM) + + +EXPLAIN +------- + +**Synopsis** + +.. code:: sql + + EXPLAIN (sql_expression|query_id); + +**Description** + +Show the execution plan for a SQL expression or, given the id of a running query, show the execution plan plus +additional runtime information and metrics. Statements such as DESCRIBE EXTENDED, for example, show the ids of +queries related to a stream or table. + +Example of explaining a running query: + +.. code:: bash + + ksql> EXPLAIN ctas_ip_sum; + + Type : QUERY + SQL : CREATE TABLE IP_SUM as SELECT ip, sum(bytes)/1024 as kbytes FROM CLICKSTREAM window SESSION (300 second) GROUP BY ip; + + + Local runtime statistics + ------------------------ + messages-per-sec: 104.38 total-messages: 14238 last-message: 12/14/17 4:30:42 PM GMT + failed-messages: 0 last-failed: n/a + (Statistics of the local Ksql Server interaction with the Kafka topic IP_SUM) + + Execution plan + -------------- + > [ PROJECT ] Schema: [IP : STRING , KBYTES : INT64]. + > [ AGGREGATE ] Schema: [CLICKSTREAM.IP : STRING , CLICKSTREAM.BYTES : INT64 , KSQL_AGG_VARIABLE_0 : INT64]. + > [ PROJECT ] Schema: [CLICKSTREAM.IP : STRING , CLICKSTREAM.BYTES : INT64]. + > [ REKEY ] Schema: [CLICKSTREAM.ROWTIME : INT64 , CLICKSTREAM.ROWKEY : STRING , CLICKSTREAM._TIME : INT64 , CLICKSTREAM.TIME : STRING , CLICKSTREAM.IP : STRING , CLICKSTREAM.REQUEST : STRING , CLICKSTREAM.STATUS : INT32 , CLICKSTREAM.USERID : INT32 , CLICKSTREAM.BYTES : INT64 , CLICKSTREAM.AGENT : STRING]. + > [ SOURCE ] Schema: [CLICKSTREAM.ROWTIME : INT64 , CLICKSTREAM.ROWKEY : STRING , CLICKSTREAM._TIME : INT64 , CLICKSTREAM.TIME : STRING , CLICKSTREAM.IP : STRING , CLICKSTREAM.REQUEST : STRING , CLICKSTREAM.STATUS : INT32 , CLICKSTREAM.USERID : INT32 , CLICKSTREAM.BYTES : INT64 , CLICKSTREAM.AGENT : STRING]. + + + Processing topology + ------------------- + Sub-topologies: + Sub-topology: 0 + Source: KSTREAM-SOURCE-0000000000 (topics: [clickstream]) + --> KSTREAM-MAP-0000000001 + Processor: KSTREAM-MAP-0000000001 (stores: []) + --> KSTREAM-TRANSFORMVALUES-0000000002 + <-- KSTREAM-SOURCE-0000000000 + + +DROP STREAM +----------- + +**Synopsis** + +.. code:: sql + + DROP STREAM stream_name; + +**Description** + +Drops an existing stream. + +DROP TABLE +---------- + +**Synopsis** + +.. code:: sql + + DROP TABLE table_name; + +**Description** + +Drops an existing table. + +SELECT +------ + +**Synopsis** + +.. code:: sql + + SELECT select_expr [, ...] + FROM from_item [, ...] + [ WINDOW window_expression ] + [ WHERE condition ] + [ GROUP BY grouping_expression ] + [ HAVING having_expression ]; + +**Description** + +Selects rows from a KSQL stream or table. The result of this statement +will not be persisted in a Kafka topic and will only be printed out in +the console. To stop the continuous query in the CLI press ``Ctrl-C``. + +In the above statements from_item is one of the following: + +- ``stream_name [ [ AS ] alias]`` +- ``table_name [ [ AS ] alias]`` +- ``from_item LEFT JOIN from_item ON join_condition`` + +The WHERE clause can refer to any column defined for a stream or table, including the two implicit columns `ROWTIME` +and `ROWKEY`. + +Example: + +.. code:: sql + + SELECT * FROM pageviews + WHERE ROWTIME >= 1510923225000 + AND ROWTIME <= 1510923228000; + +**Tip:** If you want to select older data, you can configure KSQL to query the stream from the beginning. You must +run this configuration before running the query: + +.. code:: sql + + SET 'auto.offset.reset' = 'earliest'; + +The WINDOW clause lets you control how to *group input records that have +the same key* into so-called *windows* for operations such as +aggregations or joins. Windows are tracked per record key. KSQL supports +the following WINDOW types: + +- **TUMBLING**: Tumbling windows group input records into fixed-sized, + non-overlapping windows based on the records’ timestamps. You must + specify the *window size* for tumbling windows. Note: Tumbling + windows are a special case of hopping windows where the window size + is equal to the advance interval. + + Example: + + .. code:: sql + + SELECT item_id, SUM(quantity) + FROM orders + WINDOW TUMBLING (SIZE 20 SECONDS) + GROUP BY item_id; + +- **HOPPING**: Hopping windows group input records into fixed-sized, + (possibly) overlapping windows based on the records’ timestamps. You + must specify the *window size* and the *advance interval* for hopping + windows. + + Example: + + .. code:: sql + + SELECT item_id, SUM(quantity) + FROM orders + WINDOW HOPPING (SIZE 20 SECONDS, ADVANCE BY 5 SECONDS) + GROUP BY item_id; + +- **SESSION**: Session windows group input records into so-called + sessions. You must specify the *session inactivity gap* parameter for + session windows. For example, imagine you set the inactivity gap to 5 + minutes. If, for a given record key such as “alice”, no new input + data arrives for more than 5 minutes, then the current session for + “alice” is closed, and any newly arriving data for “alice” in the + future will mark the beginning of a new session. + + Example: + + .. code:: sql + + SELECT item_id, SUM(quantity) + FROM orders + WINDOW SESSION (20 SECONDS) + GROUP BY item_id; + +CAST +~~~~ + +**Synopsis** + +.. code:: sql + + CAST (expression AS data_type); + +You can cast an expression’s type to a new type using CAST. Here is an +example of converting a BIGINT into a VARCHAR type: + +.. code:: sql + + -- This query converts the numerical count into a suffixed string; e.g., 5 becomes '5_HELLO' + SELECT page_id, CONCAT(CAST(COUNT(*) AS VARCHAR), '_HELLO') + FROM pageviews_enriched + WINDOW TUMBLING (SIZE 20 SECONDS) + GROUP BY page_id; + +LIKE +~~~~ + +**Synopsis** + +.. code:: sql + + column_name LIKE pattern; + +The LIKE operator is used for prefix or suffix matching. Currently KSQL +supports ``%``, which represents zero or more characters. + +Example: + +.. code:: sql + + SELECT user_id + FROM users + WHERE user_id LIKE 'santa%'; + +SHOW TOPICS +----------- + +**Synopsis** + +.. code:: sql + + SHOW | LIST TOPICS; + +**Description** + +List the available topics in the Kafka cluster that KSQL is configured +to connect to (default setting for ``bootstrap.servers``: +``localhost:9092``). + +SHOW STREAMS +------------ + +**Synopsis** + +.. code:: sql + + SHOW | LIST STREAMS; + +**Description** + +List the defined streams. + +SHOW TABLES +----------- + +**Synopsis** + +.. code:: sql + + SHOW | LIST TABLES; + +**Description** + +List the defined tables. + +SHOW QUERIES +------------ + +**Synopsis** + +.. code:: sql + + SHOW QUERIES; + +**Description** + +List the running persistent queries. + +SHOW PROPERTIES +--------------- + +**Synopsis** + +.. code:: sql + + SHOW PROPERTIES; + +**Description** + +List the :ref:`configuration settings ` that are +currently in effect. + +TERMINATE +--------- + +**Synopsis** + +.. code:: sql + + TERMINATE query_id; + +**Description** + +Terminate a persistent query. Persistent queries run continuously until +they are explicitly terminated. + +- In standalone mode, exiting the CLI will stop (think: “pause”) any + persistent queries because exiting the CLI will also stop the KSQL + server. When the CLI is restarted, the server will be restarted, too, + and any previously defined persistent queries will resume processing. +- In client-server mode, exiting the CLI will not stop persistent + queries because the KSQL server(s) will continue to process the + queries. + +(To terminate a non-persistent query use ``Ctrl-C`` in the CLI.) + +================ +Scalar functions +================ + ++------------+----------------------------------+----------------------+ +| Function | Example | Description | ++============+==================================+======================+ +| ABS | ``ABS(col1)`` | The absolute value | +| | | of a value | ++------------+----------------------------------+----------------------+ +| CEIL | ``CEIL(col1)`` | The ceiling of a | +| | | value | ++------------+----------------------------------+----------------------+ +| CONCAT | ``CONCAT(col1, '_hello')`` | Concatenate two | +| | | strings | ++------------+----------------------------------+----------------------+ +| EXTRACTJSO | ``EXTRACTJSONFIELD(message, '$.l | Given a string | +| NFIELD | og.cloud')`` | column in JSON | +| | | format, extract the | +| | | field that matches | ++------------+----------------------------------+----------------------+ +| FLOOR | ``FLOOR(col1)`` | The floor of a value | ++------------+----------------------------------+----------------------+ +| LCASE | ``LCASE(col1)`` | Convert a string to | +| | | lowercase | ++------------+----------------------------------+----------------------+ +| LEN | ``LEN(col1)`` | The length of a | +| | | string | ++------------+----------------------------------+----------------------+ +| RANDOM | ``RANDOM()`` | Return a random | +| | | DOUBLE value between | +| | | 0 and 1.0 | ++------------+----------------------------------+----------------------+ +| ROUND | ``ROUND(col1)`` | Round a value to the | +| | | nearest BIGINT value | ++------------+----------------------------------+----------------------+ +| STRINGTOTI | ``STRINGTOTIMESTAMP(col1, 'yyyy- | Converts a string | +| MESTAMP | MM-dd HH:mm:ss.SSS')`` | value in the given | +| | | format into the | +| | | BIGINT value | +| | | representing the | +| | | timestamp. | ++------------+----------------------------------+----------------------+ +| SUBSTRING | ``SUBSTRING(col1, 2, 5)`` | Return the substring | +| | | with the start and | +| | | end indices | ++------------+----------------------------------+----------------------+ +| TIMESTAMPT | ``TIMESTAMPTOSTRING(ROWTIME, 'yy | Converts a BIGINT | +| OSTRING | yy-MM-dd HH:mm:ss.SSS')`` | timestamp value into | +| | | the string | +| | | representation of | +| | | the timestamp in the | +| | | given format. | ++------------+----------------------------------+----------------------+ +| TRIM | ``TRIM(col1)`` | Trim the spaces from | +| | | the beginning and | +| | | end of a string | ++------------+----------------------------------+----------------------+ +| UCASE | ``UCASE(col1)`` | Convert a string to | +| | | uppercase | ++------------+----------------------------------+----------------------+ + +=================== +Aggregate functions +=================== + ++-----------------------+-----------------------+-----------------------+ +| Function | Example | Description | ++=======================+=======================+=======================+ +| COUNT | ``COUNT(col1)`` | Count the number of | +| | | rows | ++-----------------------+-----------------------+-----------------------+ +| MAX | ``MAX(col1)`` | Return the maximum | +| | | value for a given | +| | | column and window | ++-----------------------+-----------------------+-----------------------+ +| MIN | ``MIN(col1)`` | Return the minimum | +| | | value for a given | +| | | column and window | ++-----------------------+-----------------------+-----------------------+ +| SUM | ``SUM(col1)`` | Sums the column | +| | | values | ++-----------------------+-----------------------+-----------------------+ +