-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Integrations Design #3412
Comments
Regarding index naming, I would like to see the naming structure updated to account for 'tenant' and 'version'.
** As the schema will evolve it will be important to have versioning in the naming schema, which should be coupled with standard mapping definitions. As the schema evolves - so too would the mappings. |
hi @ryn9 |
I hadn't seen that Naming RFC - I will drop the same comment over there. |
Concerning your definition of o11y:
May I suggest to use the established definition by CNCF instead since it covers the feedback loop as well? This would be the text:
|
Integrations
Contents
Highlights
Design-Details
Introduction
Integration is a new type of logical component that allows high level composition of multiple Dashboards / Applications / Queries and more.
Integrations can be used to bring together all the metrics and logs from the infrastructure and gain insight into the unified system as a whole.
Some products address integrations as consisting of the next parts
This RFC will only address the last part that includes dashboard. Introducing this concept will allow OpenSearch dashboards to be used in a much broader way using pre-canned components (such as display elements and queries).
Dashboard users which are interested on understanding and analyzing their infrastructure components will be able to search for these components in our integration repository and add them to their system.
Such integration can include infrastructure components such as AWS's EKS,ELB, ECS and many more...
Once integrated, bundled dashboards and queries can deliver a higher and dedicated observability and accessibility into the system for better understanding and monitoring.
Integration is tightly coupled with a schema that represents the data this Integration is representing, in the Observability use case the schema relates to Traces, Logs, Metrics and Alerts.
Integration for security related dashboards and data would concern with types and relationships that address that domain.
Background
Observability is the capability to continuously generate and discover actionable insights based on signals from the system under observation. In other words, observability allows users to understand a system’s state from its external output and take (corrective) action.
Observability telemetry signals (logs, metrics, traces, alerts) arriving from the system would contain all the necessary information needed to observe and monitor.
Modern application can have a complicated distributed architecture that combines cloud native and microservices layers. Each layer produces telemetry signals that may have different structure and information.
Using Observability telemetry schema we can organize, correlate and investigate system behavior in a standard and well-defined manner.
Observability telemetry schema defines the following components - logs, traces, metrics and alerts.
Logs provide comprehensive system details, such as a fault and the specific time when the fault occurred. By analyzing the logs, one can troubleshoot code and identify where and why the error occurred.
Traces represent the entire journey of a request or action as it moves through all the layers of a distributed system. Traces allow you to profile and observe systems, especially containerized applications, serverless architectures, or microservices architecture.
Metrics provide a numerical representation of data that can be used to determine a service or component’s overall behaviour over time.
In many occasions, correlate between the logs, traces and metrics is mandatory to be able to monitor and understand how the system is behaving. In addition, the distributed nature of the application produces multiple formats of telemetry signals arriving from different components ( network router, web server, database)
For such correlation to be possible the industry has formulated several protocols (OTEL, ECS, OpenMetrics, Alerts) for communicating these signals - the Observability schemas.
Problem definition
Today in OpenSearch, Observability and its dashboards are only partially aware (traces only) of the schematic structure of these signal types. In addition, the actual schema mapping is not present internally in the Observability plugin and has to be imported externally.
Integrating different data producers and correlating different signals is practically unavailable most of the time due to missing schema definition and correlated field names and has to be done manually by every customer in every system.
Integration of a new Observability data source (such as NGINX / Tomcat) includes complicated configuration of both the ingestion process and the actual index store, manually discovery of the specific format of the new datasource and the crafting of the dedicated dashboards for its proprietary fields.
Proposal
Our goal is creating a consolidated Observability solution. It will allows customers to ingest any type of supported telemetry data from many types of providers and be able to display and analyze the data in a common and unified way.
Customers using Observability are expecting our solution to allow simple and out of the box integration and configuration.
Using a unified schema that models all the Observability components and allowing customers to add integrations would simplify the daily monitoring and incidents investigations process (by using pre-canned dashboards and pre-defined correlation and alerts).
As an example for the importance of a common schema :
In a multi-layered application which produces multiple log and trace signals from different software and Network components - we need to address these signals using a common vocabulary. Such a vocabulary would simplify correlating information using common fields such as “
process.args
”, “host.domain
”, “observer.os
”Schema support for Observability
Simple schema for Observability is defined by the 4 main structured types OpenTelemetry & ECS define and supports which are Logs, Traces, Metric, alerts.
OpenSearch's Observability Plugin will support these schema structures out of the box in the form of an index pattern per type (will be detailed below).
Supplement schema
Any additional index that can be added by customer or a 3rd party integration component will be categorized as supplement index. Supplement indices often present enriched Observability information that has a schema.
These supplement indices may be used by “Schema-Aware” visualization component or queries.
Schema Aware Components
The role of the Observability plugin is intended to allow maximum flexibility and not imposing a strict Index structure of the data source. Nevertheless, the modern nature of distributed application and the vast amount of telemetry producers is changing this perception.
Today most of the Observability solutions (splunk, datadog, dynatrace) recommend using a consolidated schema to represent the entire variance of log/trace/metrics producers.
This allows monitoring, incidents investigation and corrections process to become simpler, maintainable and reproducible.
A Schema-Aware visualization component is a component which assumes the existence of specific index/indices and expects these indices to have a specific structure - schema.
As an example we can see that Trace-Analytics is schema-aware since it directly assumes the traces & serviceMap indices exist and expects them to follow a specific schema.
This definition doesn’t change the existing status of visualization components which are not “Schema Aware” but it only regulates which Visual components would benefit using a schema and which will be agnostic of its content.
Operation Panel for example, are not “schema aware” since they don’t assume in advanced the existence of a specific index nor do they expect the index they display to have a specific structure.
Schema aware visualizations such as Applications, Metrics, Alerts and Integrations will not be able to work directly with a non-standard proprietary index unless being explicitly mapped during the query execution - this schema-on-read feature will be discussed later
Data Model
Observability data indices themselves have a data model which they support and comply with (Traces, Logs, Metrics & Alerts), this data model is versioned to allow future evolution.
OpenSearch is aware of the existing leading Observability formats (OTEL / ECS) and should help customers use either one of the formats in the Observability Plugin.
Observability needs to allow ingestion of both formats and internally consolidate them to best of its capabilities for presenting a unified Observability platform.
The data model is highly coupled with the visual components, for example - the Application visual component & Trace analytics are directly coupled with all the Observability schemas (Logs, Traces, Spans) and possibly with some Supplement schema (ServiceMap by data-prepper ingestion pipline)
Ingestion Pipeline
A mandatory part of an Observability solution is its ability to ingest data at scale, currently, OpenSearch Observability support the following out of the box schematized data providers:
Data Prepper:
Indices:
- Traces data: otel-v1-apm-span-*
(Data prepper Observability Trace mapping)*
- Logs data: N/A
- Metrics data: N/A
- Alerts: N/A
- Supplement: otel-v1-apm-service-map* (Proprietary Index Mapping)
Dashboards:
Jaeger :
Indices:
(jaeger Observability Trace mapping)
- Alerts: N/A
Dashboards:
Observability Indices
As states above, the Observability indices for collecting the main 4 telemetry types are
Observability index naming
Observability will allow ingestion of both leading Observability formats (OTEL, ECS) and internally consolidate to the best of our capabilities for presenting a unified Observability platform.
The Observability indices would follow the recommended immutable data stream ingestion pattern using the data_stream concepts
The Observability index pattern will follow the next naming structure {type}-{dataset}-{namespace}
sso_
schema convention )nginx.access
Logs Schema
see - opensearch-project/observability#1403
Traces Schema
see - opensearch-project/observability#1395
Metrics Schema
see - opensearch-project/observability#1397
Data index routing
The ingestion pipline can route ingested Observability data (log/trace/metrics...) into a specific named index that is the source of its supporting dashboards. The {type}-{dataset}-{namespace} combination dictates the target index.
For example if within the ingested log contains the following section:
This indicates that the target index for this observability signal should be
traces-mysql-prod
index that follows the traces schema.Integration index routing
Similar to the concept of index routing by the ingestion pipeline, an Integration must also declare its expected ingestion target index - for example
Indicates that the target index for this Integration is logs-nginx.access-user_domain - this part is declarative and helps the process of installation to validate the pre-define indices exist according to the requested patterns.
Note - data can always be routed into a specific index depending on the information under the
attributes.data_stream
.Schema driven Dashboards
OpenSearch goal has always been to simplify and allow collaborative capabilities for the Observability plugin.
The new Integration component is responsible for allowing a seamless integration of a new Observability data provider dashboards. This includes the well-structured indices, easy configuration and a common convention for ingesting multiple datasources.
Integration is an encapsulated artifact that contains the following parts (as described above)
The next workflow explains how the process of activating a new Integration is happening:
Integrating Component Structure
The following section details the structure and composition of an integration component and how it may be utilized for the Observability use-cases.
Structure
As mentioned above, integration is a collection of elements that formulate how to observe a specific data emitting resource - in our case a telemetry data producer.
A typical Observability Integration consists of the following parts:
Metadata
Display components
A major factor in the following RFC is that structured data has an enormous contribution to the understanding of the system behaviour.
Once input content has form and shape - it can and will be used to calculate and correlate different pieces of data.
The next parts of this document will present Integrations For Observability which has a key concept of Observability schema.
It will overview the concepts of observability, will describe the current issues customers are facing with observability and continue to elaborate on how to mitigate them using Integrations and structured schemas.
Integration usage workflows
The following workflows describes the end-to-end flows from the ingestion step to the discovery and analysis phase including the building and preparation of an Integration and publishing it with the community .
1) Creating An Integration
Integrations are an encapsulated collection of elements and a such have a specific structure.
NginX
Lets examine the next NginX integration component:
Definitions
config.json
defines the general configuration for the entire integration component.display
this is the folder in which the actual visualization components are storedqueries
this is the folder in which the actual PPL queries are storedschemas
this is the folder in which the schemas are stored - schema for mapping translations or index mapping.samples
this folder contains sample logs and translated logs are presentmetadata
this folder contains additional metadata definitions such as security and policiesinfo
this folder contains documentations, licences and external referencesConfig.json
file includes the following Integration configurationDefinitions:
version:
This references the next semantic versioning:
integ
version indicates the version for this specific Integrationschema
version indicates the Observability schema versionresource
version indicates the actual resource version which is being integrated.identification:
This references the field this integration is using to explicitly identify the resource the signal is generated from
In this case the field resides in the
instrumentationScope.attributes.identification
path and should have a value that corresponds to the name of the integration.Categories:
This section defines the classification categories associated to this Integration according to ECS specification
collection:
This references the different types of collection this integration if offering. It can be one of the following
{
Traces, Logs, Metrics, Alerts, Supplements
}.Collections
Let's dive into a specific log collection:
This log collects nginx access logs as described in the
info
section.The
input_type
is a categorical classification of the log kind which is specified in the ECS specification as well.dataset
is defined above and indicates the target routing index.lables
are general purpose labeling tags that allow further correlation and associations.schema
is the location of the mapping configuration between the original log format to the Observability Log format.Display:
Visualization contains the relevant visual components associated with this integration.
The visual display component will need to be validated to the schema that it is expected to work on - this may be part of the Integration validation flow...
Queries
Queries contains specific PPL queries that precisely demonstrates some common and useful use-case .
Example:
-- The visual display component will need to be validated to the schema that it is expected to work on
*Future Enhancements *
In the future, an Integration would be able to supplement the Observability main SSO schemas with some proprietary schemas - such as the serviceMap which is currently defined and maintained by data-prepper.
The
schema, index, label
&input_type
will define the explicit way that the data is ingested, which schema is backing its visualizations and the tagging information its labeled with.2) Testing / Validating An Integration.
After the Integration was developed, it has to be tested and validated prior of publication to a shared repo.
Validation of an Integration is expected to be a build-time phase. It also expects that it will verify that the following
config.json
is complete and contains all the mandatory parts.collections
elements have a compatible transformation schema and this schema complies with the SSO versioned schema.Display Validation: make sure all the display components have a valid json structure and if the explicitly reference fields - these fields must be aligned with the SSO schema type (Trace/Metrics/Logs...)
Query Validation: make sure all the queries have a valid PPL structure and if the explicitly reference fields - these fields must be aligned with the SSO schema type (Trace/Metrics/Logs...)
End to End
All these validations would use a dedicated validation & testing library supplied by SimpleSchema plugin.
3) Publishing An Integration.
Once an integration is created and tested, it should be signed and uploaded into a shared public dedicated repository [The location / owners of this repository should be discussed ] is should be discussed .
Each published Integration artifact will be mandatory to attache the following (which would be validated during the upload:
Metadata
Samples
Open Search Integration Verification Review Process
Once an integration is published, it goes into an OpenSearch Integration review process.
Once an integration is reviewed and validated - it will be published in OpenSearch’s recommendation Integrations and will be able to be assembled in the complete Observability Solution.
Verification process includes running the docker sample and verifying all the display components are functioning as expected.
In the future OpenSearch can automate this process by requiring a dedicated API or baseline queries and functionality to work on the Integration thus automating this validation phase completely.
An investigation can also be published to the public repository without the review process. Integrations not passing this process would not be bundled in the Observability release or be lined and recommended by OpenSearch. Nevertheless they can still be manually Installed in an Observability cluster and the Installing party is responsible for making sure they will operate properly .
4) Using Integrations inside Observability.
This phase describes the use case for a customer using Observability, it describes how such customer loads different Integrations so that it may be used to easily visualize and analyze existing data using pre-canned dashboards and visualizations.
In the former part (Publishing An Integration) we defined the Open Search Integration Verification Review Process.
The integrations passing this process can be available out of the box once the Observability plugin is loaded. This availability means that these Integrations can be packages together and assembled with the Observability solution.
Once an Observability is distributed, it is pre-bundled with the verified Integrations. These integrations are packaged in a dedicated folder.
Integration Lifecycle
Observability bootstrap initiates the state for all the Integrations bundled with the distribution, the initial state is
Loading - indicating the integration is still loading and has not yet been verified for runtime readiness.
Maintenance
indicating some components weren’t loaded / created as required and the appropriate info will be detailed on the missing parts:
- Index may not exist
- Dashboard could failed importing (name collision)
- Configuration is broken for some component and needs mending
Once the issues are corrected it will transform to the *Ready2Ingest *stage
→ Ready2Ingest - indicating the integration was loaded and verified all the needed indices / dashboards are ready - but no data was found matching the expected classification filters.
→ Ready2Read - indicating the integration is populating the indices and data can be visualized and queried.
The system would differentiate from the Ready2Ingest and Ready2Read phases using specific queries designed to classify the specific resource data existing in the target index.
Future Enhancements
We will be able to add the next phases to the Integration lifecycle - this sub-state can be configured using expected default behaviour and policies.
Integration Development Test-Harness
In order to simplify and automate the process of validating an Integration compliant to OpenSearch Observability - we suggest the next Testing harness. Test harness is essentially an End to End standard that requires the developer to setup and provide the next components:
Docker compose with the following :
The test flow will have the following steps:
Initiating the Integration Pipline
Next step would be to run a series of baseline queries that should be part of the verification to prove correctness, the queries must match the existing sample data including time and measurements .
Result of these queries (including UX driven queries) are also compared with the expected results and verified for correctness .
This completes the test verification process and verifies the Integration is compliant with the Observability schema and visual components .
Appendix:
Observability Physical mapping
As part of the Observability Integration, Observability will publish a schema that is conformed by & data-prepare & fluent-d plugins / libraries .
Additional information attached:
Nginx Playground Dashboard
Observability Naming standards
The text was updated successfully, but these errors were encountered: