-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contribution of the Flowmill eBPF Collector to OpenTelemetry #733
Comments
CC: @open-telemetry/technical-committee Following https://github.com/open-telemetry/community/blob/b24dd0e4ab9b4391190dd20a12bf68aff2007162/CONTRIBUTING.md#donations procedure, added topic to the TC meeting agenda. Next meeting is at 5/19/2021 @yonch have you posted a note on Slack? I'd suggest to do it in the main channel and specification channel to get feedback from the community. |
Thank you very much, @yonch, for offering to donate this code! eBPF is powerful and exciting yet also a bit daunting, so the idea of beginning with a production-tested piece of code (written by domain experts, no less) like the Flowmill collector is certainly appealing. My If it's not clear by implication, my hope is that we can find a way to accept this donation without increasing scope for other OTel components that are on the critical path to general availability for core tracing and metrics functionality in all major languages and environments. |
@SergeyKanzhelev Thank you for putting this on the agenda! I'll post a note on Slack. Happy to join the TC meeting as an observer to answer any questions you might have if helpful (@jonathan Perry on CNCF slack) @bhs agree with your concern of scope; with many telemetry sources converging at the collector, scope needs to managed. The current integration we built is based on OTLP logs. The telemetry collected by the eBPF collector maps directly to logs (today), and metrics and traces (in the future if desirable) without modification to OTel semantic conventions. So at this point the answer to your question of needed changes appears to be "none at all". If future feature requests arise that might impact collector scope, we will work with the OTel team on how to best accomplish those with minimal/low scope impact. For example, say there was a request for a processor to metricize eBPF-generated logs. Rather than adding a processor, the eBPF collector could emit metrics directly into the OTLP metrics receiver, requiring no OTel collector changes. This is not a guarantee that eBPF will never contribute to collector scope. But we don't know of any current requirements. Hope this alleviates some of the concern, happy to discuss this further if desirable! |
I'm interested in the TC review and would like to be included in the discussion if possible. @SergeyKanzhelev I think I missed the 5/19 meeting. Any follow-ups? |
@yonch have you had a chance to post on Slack? I'm looking for more feedback/concerns from wider community with two aspects:
So far I haven't heard any concerns about item 2, but also no immediate interest (item 1). It would be great to gather more community feedback though. |
my personal opinion, FWIW: have met @yonch before and am a bit familiar with Flowmill and I think, on the very highest level, it will be great to have this as a part of the community. |
We discussed this at TC meeting today. Here is an update:
I believe we can close on this in two weeks by the next TC meeting. Open questions:
Another question is whether Flowmill is a registered trademark and whether this donation includes the trademark or we may need to rename the project to avoid possible problems adopting this technology by other vendors. |
Here is a reference to the Slack post on |
Following up from yesterday's GC meetings and ongoing emails, @alolita mentioned that Amazon is looking to contribute to this Flowmill donation (and they're also involved with Pixie, which is great). Sergey and Jonathan have also jointly reached out to Pixie. |
@yonch thanks much for the detailed proposal. I'm not in the TC to have context about the earlier discussions so sorry in advance if these topics are already discussed.
|
Thank you for the discussion so far!
|
The question is not whether logs can be ingested into the Elastic. I think the main question is what scenario customer can achieve using Flowmill without the specialized backend. Or whether specialized backend (Pixie?) is required. For example, if denormalization and pre-processing on collector is needed to make sense out of the data, we need to make it clear as part of the donation. And we need to make sure that implementation of this additional layer will be supported by community. @alolita please chime in here.
Either approach works for me. Not a layer, but I believe if we keeping the name these trademarks need to be be transferred to CNCF:
This goes to the first question about the backend. @yonch I recommend the following as a prerequisite for donation:
|
Thank you @SergeyKanzhelev, we'll start on this. Just a heads up that I am traveling the next few days so will have reduced availability.. |
@SergeyKanzhelev to follow up on the recommendations above, I just added a section "Reducing barriers for using network telemetry" in the issue text above, below "Next steps". It is a proposal to make it easier to leverage network telemetry with no specialized backend. Happy to iterate on it with feedback! (I put it as part of this issue so newcomers to the issue have a complete context) |
@open-telemetry/technical-committee due diligence is complete: https://docs.google.com/document/d/1CRY-GU4ENgjC9suJu4RRA984HvJ3Gxxa5e7mKx4t8ss/edit#heading=h.pwdftsvax4ni @open-telemetry/governance-committee agree with proposal to create an ebpf working group and conditionally approve the donation based on this WG agreement on alternative back end and extensibility design |
Thank you @SergeyKanzhelev and team for the due diligence! Looking forward to working with folks on the ebpf working group! |
@open-telemetry/technical-committee and @open-telemetry/governance-committee the eBPF workgroup is recommending adopting this contribution, and has agreed on the path forward on the topics of:
I think the recommendations give us a very good start on a trajectory towards a healthy and vibrant eBPF collector ecosystem that is well-integrated with open-telemetry. Here is the discussion summary and recommendations: https://docs.google.com/document/d/1WkH-UVzzMOdJEhQ2d9CS-jnd8GRbm1nV4GspfxF1pGM/edit# Given the conditional approvals at the TC (in the due diligence doc) and the GC, I believe this completes all the requirements for contribution -- TC and GC, please chime in in case there are further comments. @rakyll, @mikezvi, @oazizi000, @jkowall, @mtwo and all others who participated -- thank you, and looking forward to continue the work for the eBPF community on the WG! |
I added this to the next TC meeting agenda. |
I believe we still need to complete the step to ensure the contributor (Splunk) has circulated the OTel marketing guidelines internally -- the press releases do seem like they've already flown the coop though ;) |
We discussed this in the TC, and agree that this contribution is ready for integration within OpenTelemetry. Specifically want to call out three important points we discussed:
We agree that requirements set for contribution are met. Looking forward to your future contributions and integrating these solutions into the OpenTelemetry ecosystem! |
I wonder if we need to keep this issue open to finalize any remaining technical steps or we can close it and new issues can be created as necessary. The repo is already created at https://github.com/open-telemetry/wg-ebpf (may need to be renamed to remove "wg-" prefix). |
Please open new issues as needed for continue contribution! |
Splunk proposes to contribute the Flowmill Collector to the Cloud Native Computing Foundation (CNCF) OpenTelemetry project, allowing network telemetry collected via eBPF to be sent to the OpenTelemetry collector. This is intended to be an additional data source to the OpenTelemetry Collector and Splunk intends to commit members of its team to maintaining this code and advancing its roadmap.
Github repo: https://github.com/Flowmill/flowmill-collector
What is the Flowmill Collector
The Flowmill Collector is an agent designed to collect low-level telemetry from the Linux kernel using eBPF technology. This telemetry is targeted at the Linux network subsystem, tracking the lifetimes of open sockets, conntrack entries, processes, and cgroups to collect metrics on every socket’s behavior with extremely low latency. In the future, this framework could be extended to include other areas of the kernel, such as disk I/O and performance.
Why eBPF
Network socket metrics are readily available from userspace via the "ss" command. However, polling in this manner is insufficient to build a complete, accurate picture of network behavior in a microservice environment. In modern applications, processes and containers may have extremely short lifetimes, and any approach based on polling would invariably miss connections while also generating high overhead on the host system.
eBPF provides a highly efficient mechanism to gather telemetry on sockets through kernel probes and accurately correlate them with process and cgroup information. This can be used to expose data readily available in the networking stack, such as bytes transmitted, and offer a mechanism to inspect the live data stream to identify Layer 7 information.
eBPF instrumentation, being operating system-based, does not require a sidecar or other per-service components and does not require modification of an individual service’s runtime or configuration. Instead, each operating system runs a single instance of the collector, which provides visibility into every process and container on the host. This means instrumentation can be deployed with low operator effort across an entire cluster.
Further, eBPF provides system-wide visibility: it can produce telemetry for workloads running via every orchestrator: Kubernetes pods, systemd services, non-Kubernetes Docker containers, Nomad, etc.
Why Network Telemetry is Important
The network occupies a special place in the world of observability. It serves a dual role as a source of problems, since network reliability can impact applications, and as a source of insight since application behavior can be observed from network traffic.
This means network telemetry has two critical areas of impact in an observability platform.
By tracking application behavior, network telemetry provides an accurate, complete mental model of a deployment in a system with changing service dependencies and ensures consistent collection of health metrics between services. This data is critical when starting a triage process, making significant changes like deprecating an API, or training new team members.
Network telemetry also plays a critical role in measuring the impact of infrastructure problems on distributed services. Network connectivity issues in cloud providers, misconfigured security rules, DNS problems, outages in managed services, and degraded cross-zone bandwidth can impact the reliability and performance of an application but their effects are difficult to trace back to specific services. Even when networking issues are not the root cause, a lack of sufficient visibility can cause teams to expend significant effort in ruling it out.
By collecting telemetry on every connection between hosts, containers, and processes, an observability system can map service dependencies and identify network infrastructure problems impacting distributed workloads without requiring any changes to application code or container images.
Why OpenTelemetry
Network telemetry with eBPF is a natural fit within the OpenTelemetry project for a number of reasons:
It offers users a new data source that augments existing traces, metrics, logs handled today.
Detailed, pairwise network telemetry is not available in OpenTelemetry today and its inclusion would give users a single place to manage telemetry collection with this new data source. This is well aligned with OpenTelemetry’s goals.
The approach with eBPF would be extensible to a wide range of data available in the Linux kernel that could improve observability and be collected with no developer intervention.
What is unique about the Flowmill implementation
Wide kernel version support. The eBPF subsystem has been steadily adding new helper functions; instrumentation relying on these new helper functions limits the kernels on which it can run. The Flowmill implementation falls back to earlier helper functions on older kernels where relevant. Supported distributions include:
CentOS, Red Hat: 7.6+
Ubuntu: 16.04+
Debian 9+
Amazon Linux / Amazon Linux 2
Google Container Optimized OS
Other distributions with Linux kernels: 4.4+
A highly extensible code generation framework makes it easy to add instrumentation
Multiple years of testing and debugging at scale in production across thousands of instances
Optimized for low overhead operation (< 0.25% CPU / core) and network overhead ( < 1% additional bandwidth).
The collector can be deployed on a running system and gathers telemetry on all pre-existing containers, processes, and sockets. This allows for collector upgrades without draining traffic.
Native container support. Collects container unique identifier so that telemetry can be enriched with metadata from Kubernetes, Nomad, Docker swarm, etc.
Collects sufficient information to match sockets on both ends of a connection. For example, when a Pod connects to a Service’s IP address on Kubernetes, the collector also retrieves the destination Pod IP -- where the Service actually routed traffic to. This enables further analysis to find problems in individual service instances rather than aggregate.
Use Cases
Network observability in cloud native environments is an important extension for the OpenTelemetry project. Networks are frequently blamed for reliability and performance issues, but it can be very challenging to determine how a microservice, distributed across tens or hundreds of instances, interacts with the network. This telemetry can be used to reduce or eliminate that visibility gap.
Some of the most common applications of network telemetry in cloud native environments include:
Network reliability and performance: Understanding which microservices and which containers are affected by network problems by collecting telemetry on TCP events such as retransmissions, connection failures, and resets.
Network communication latency and costs: Understanding communication patterns of containers between zones or datacenters by attributing cross-zone, cross-region and egress traffic to specific services. This can be used to optimize latency, plan capacity, and optimize cost in the public cloud, where providers charge for cross-zone, cross-region and Internet traffic.
Service interactions: Understanding the traffic flow between services without the need for additional instrumentation or proxies to automatically build a picture of service behavior or discover unexpected or latent interactions.
Security policy monitoring: Network telemetry provides visibility into the actual traffic and can be used to build alerting or audit capabilities for traffic that does not conform to ascribed patterns. For example, is a sensitive database sending a large volume of data to the internet (potential exfiltration)? Is an internal-facing service resolving downstream services via public DNS servers rather than internal servers?
Collector Details and Architecture
The Flowmill collectors are written in C/C++ and are designed to be highly efficient in CPU consumption. The kernel collector, runs on every host (e.g., as a DaemonSet on Kubernetes), and uses eBPF. Flowmill also currently includes two other collectors, the k8s and aws collectors to retrieve metadata from the Kubernetes API server and from AWS, however These are not described in depth since we believe we can transition to using the existing OpenTelemetry components.
The data-points collected with eBPF includes TCP metrics (throughput, retransmission rates, connection failure rates, reset rates, etc.), UDP metrics (throughput, packets, address changes), HTTP metrics (response codes, latency), and DNS metrics (success and error rates, latency). This is easily extensible via additional eBPF programming to other forms of network and non-network data available in the operating system.
The Flowmill kernel collector gathers information on processes, containers, and network address translation to properly account for the behavior of cloud native environments. The kernel collector also interacts with docker, where available, to retrieve metadata related to cgroup (container) creation events.
The OpenTelemetry backend sends telemetry on the otel collector’s Logs via OTLP/HTTP.
The kernel collector uses the C API of bcc to compile, load and manage eBPF programs. The eBPF instrumentation code is written in C; the collector uses bcc’s LLVM integration to compile that code to eBPF bytecode. The collector loader scripts include code that locates kernel headers if they are installed on the system, or fetches and caches kernel header packages for the running system locally if needed.
During the loading process, the collector instruments different lifecycle events of kernel entities: first containers, then processes, then NAT and then sockets. This is done in order, so users can expect that container information for a process is always available before the process event is reported, etc.
When the collector adds instrumentation for a kernel entity (e.g., containers), the collector also scans all pre-existing entities. The scan needs to be coherent with new lifecycle events reported by the kernel as the scan is progressing: the collector avoids reporting that a process terminated and afterwards reporting that the process pre-existed the agent (i.e., resulted in the scan). To get a coherent picture, the scan is done by adding eBPF probes to functions providing information in the
/proc/
filesystem, and triggering these eBPF probes by interacting with/proc
. This ensures a coherent stream of events: once a container is reported closed, there will not be another event with a later timestamp relating to the container.The collector uses shared memory rings ("Perf rings") to communicate with the kernel. eBPF assumes a separate ring per CPU core. This adds uncertainty to collection: what is the order of messages between the different rings? The collector stitches message streams to one coherent stream, so messages are linearized. This is accomplished by (1) lifecycle instrumentation probes are within locked regions (so a core emitting a “socket open” event cannot race with a different core that wants to emit the “socket close” for the same socket, for example), and (2) the collector demultiplexes messages according to timestamps from a clock source that is synchronized across cores.
The following diagram illustrates main components of the kernel collector:
Properties of the implementation:
Summary
The Flowmill Collector would be a valuable addition to the OpenTelemetry project. It would immediately bring a source of container-aware network telemetry to the project, which would benefit microservice platform operators. It would also provide a strong foundation in eBPF technology upon which to continue to build additional telemetry using operating system data.
Next Steps
Reducing barriers for using network telemetry. See the section below.
Integrate Flowmill AWS and k8s collector metadata with OpenTelemetry existing collectors. The Flowmill implementation includes collectors for AWS network-interfaces, and for k8s resources for enriching network telemetry with workload information. This work will either deprecate the Flowmill implementations where data is redundant, or extend the existing OpenTelemetry collectors for parity.
Add processors to metricize network data. These processors will process network telemetry and create metrics from aggregations of network events, for example, produce a metric of total traffic bytes per minute from a Deployment (aggregated over all Pods on the host) to a Service.
Prototype the Render/Flowmetry protocol with OpenTelemetry. The Flowmill collector includes a contribution of a low-overhead encoding and decoding protocol. It relies on the fact that while a protocol can evolve over time, each instance of a collector has a fixed version of the protocol, and can generate many messages on that schema. Protobuf encodes information about the fields in the protobuf into every sent buffer, which increases the cost of parsing. In Render/Flowmetry, the collector sends its message prototype once, then the receiver JIT-compiles an efficient parser (using LLVM) to achieve small message parsing overheads. We would like to benchmark Protobuf and Render/Flowmetry implementations to understand if this technique achieves significant reduction in the OpenTelemetry setting.
Reducing barriers for using network telemetry
Background
The Flowmill Collector uses eBPF (extended Berkeley Packet Filter) to collect metrics on each socket. The base dataset is ‘normalized’ in that logs are annotated with unique identifiers, but do not carry a full complement of system metadata. For example, when a new socket opens on the system, the log contains the socket ID and information about the process that opened the socket. In contrast, connection failure or packet drop logs do not repeat the information about the process, but rather only reference the socket ID (whose open message contained process information). We propose mechanisms that enrich the detailed logs, using information on socket, pid, and container that was included in previous logs.
The original approach of outputting normalized logs with parsimonious labeling has several advantages:
However, it has several disadvantages as well, notably that the data generated is not easily consumable without some post processing to correlate sockets to other metadata.
Some of the high priority labels that could be added include:
Proposal 1: Processor in the OpenTelemetry collector
This variant implements enrichment in the OpenTelemetry collector rather than modifying the Flowmill collector (described further below). While more changes are required in the OpenTelemetry collector, one benefit is an enrichment component that is configurable through the Otel config, and naturally fits into the processor ecosystem.
OpenTelemetry architecture is well suited to the enrichment problem. The collector is designed to be minimal and highly efficient, and processors can handle further transformations and enrichment. Such a processor can de-normalize the telemetry stream generated by the Flowmill collector to include additional enrichment fields.
The output of this processor would include a telemetry stream identical to the original collector but with richer, configurable labeling. This data could then be readily stored and analyzed in a logs backend or further analyzed by processors (e.g., converted to a metrics stream).
A Processor in the OpenTelemetry collector for enriching network logs would hold mappings from unique identifiers to enrichment metadata. For example a map from a socket ID to the source and destination IP addresses and the socket’s process ID. Another map would be from process ID to process name and the process’s container ID. Normalized logs enter the processor. The processor adds selected metadata labels, and outputs de-normalized logs.
There are two challenges in this Processor proposal. One is that when receiving an input log line, the processor must already hold all the metadata required to enrich that log. A container log must precede any processes in the container, and a socket log must precede the process it is in. The second challenge is the converse: knowing when metadata is no longer relevant, so memory can be reclaimed, and the processor does not enrich with stale metadata.
These are exactly the challenges a backend would face when enriching this data, and the Flowmill collector was built to facilitate the solution.
The Flowmill collector ensures telemetry receivers have sufficient information to enrich telemetry by tracking telemetry “sessions”. At the beginning of each new session, the collector sends information about existing containers, processes, and sockets — a “snapshot” of existing system state. Each session is a persistent TCP connection; when the TCP connection breaks, the session ends. When the collector establishes a new TCP connection, a new telemetry session starts, with a snapshot of system state.
Logs within a session represent the changes to the underlying system’s state: when a socket closes, the collector generates a log notifying of the close, using the same socket ID as the new_socket log. Similarly processes generate close logs. To make enrichment easy, the implementation keeps the order of logs consistent across entities: all sockets close before the process they are in, all processes before their container, etc.
These properties of the Flowmill collector facilitate construction of the enriching processor. The processor maintains metadata mappings, and on each input log, the processor consults those mappings and enriches the log. After enrichment, the processor updates its metadata mappings using that log, adding and removing containers, processes, and socket mappings from its data structures for “open” and “close” logs.
To make the proposal more concrete, here is a possible design. First, session tracking. We would like our Processor to receive a notification when a session starts and ends, and receive all logs for a particular session annotated with the session ID. We can implement that functionality in the Receiver component: the Receiver will support persistent connections. When a telemetry source connects to the persistent receiver, the receiver assigns a session UID, and outputs a “new session” log. The receiver then annotates logs with their session UID. When the connection terminates, the receiver emits a “session close” log.
The processor maintains a map from session UID to
Session
structs. Those Session structs hold metadata mappings. The Processor adds and removes Sessions from the map on “new session” and “session close” logs from the receiver. The network logs come annotated with their session UID — the processor looks up the Session struct associated with he session, and performs enrichment and metadata maintenance.The Flowmill schema DSL (in ‘render/flowmill.render’) includes the
start
andend
keywords on “new” and “close”-type messages for entities (containers, processes, sockets, etc.). The included render compiler uses templates to generate code from messages, and it is relatively straightforward to add such a template to generate the Session struct and the log processing code (i.e., check if this log is astart
orend
, and maintain the corresponding mapping).Proposal 2: configurable enrichment in the Flowmill collector
The Flowmill collector already maintains metadata maps for several entities, however there are some gaps: not all entities are tracked, and not all metadata is preserved (instead, today some metadata is relayed to the receiver and “forgotten” in the collector). In this proposal variant, the mappings are enhanced to fill the gaps and hold the metadata for enrichment. The log encoder can then selectively enrich logs with the metadata from the maps.
The project maintainers are in advanced stages of designing simpler, more efficient snapshot code that can be triggered on-demand (i.e., not only on reconnections). The current design would fill the gaps in the metadata maps, and would further create a metadata “inventory” in the schema file. The new metadata maps would be directly useful for enrichment in the Flowmill collector, and would simplify generating
Session
code for an OpenTelemetry Processor.The text was updated successfully, but these errors were encountered: