Curious to know how big companies are operating their kafka fleet in production? This might be the repo for you:
- What are the issues encountered when running kafka in production? 📝
- How other organisations attempt to solve the issues? 🛠️
- Why certain approaches are adopted over others? ⚖️
- What can we learn for our own use case?
- Adobe
- Agoda
- Airbnb
- Allegro
- Apple
- AppsFlyer
- BigCommerce
- Bitpanda
- Bloomberg
- Bolt
- Booking.com
- Brex
- CERN
- Cloudflare
- Cloudera
- Coinbase
- Criteo
- Datadog
- DoorDash
- Decathlon
- Deliveroo
- GoTo
- Grab
- HelloFresh
- Honeycomb
- Hubspot
- Indeed
- Klarna
- Lyft
- Michelin
- Monzo
- Morgan Stanley
- Netflix
- New Relic
- PayPal
- Platformatory
- Riskified
- Robinhood
- Shopify
- Slack
- Stripe
- Uber
- Wise
- Wix
- Yelp
- Zalando
- Zendesk
- Zopa Bank
- How Adobe Experience Platform Pipeline Became the Cornerstone of In-Flight Processing for Adobe -
2019
- 📚 - Moving Beyond Newtonian Reductionism in the Management of Large-Scale Distributed Systems, Part 2 -
2019
- 📚 - Adobe Experience Platform’s Streaming Sources and Destinations Overview and Architecture -
2019
- 📚 - Wins from Effective Kafka Monitoring at Adobe: Stability, Performance, and Cost Savings -
2019
- 📚 - Creating Adobe Experience Platform Pipeline with Kafka -
2018
- 📚
- How We Solve Load Balancing Challenges in Apache Kafka -
2024
- 📚 - How Agoda manages 1.5 Trillion Events per day on Kafka -
2021
- 📚 - Adding Time Lag to Monitor Kafka Consumer -
2021
- 📚 - How our data scientists' petabytes of data is ingested into Hadoop (from Kafka) -
2021
- 📚
- Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Streaming Applications -
2024
- 🎙️ - Balance Kafka Cluster with Zero Data Movement -
2023
- 🎙️ - Experiences Operating Apache Kafka® at Scale -
2019
- 🎙️ - Kafka as a Service A Tale of Security and Multi Tenancy -
2018
- 🎙️
- Four Crucial Steps to Take Before Changing Kafka Partition Key at Scale -
2023
- 📚 - Kafka Lag Monitoring For Human Beings -
2020
- 🎙️ - Apache Kafka Lag Monitoring at AppsFlyer -
2020
- 📚 - Managing your Kafka in an explosive growth environment -
2019
- 🎙️
- Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML, Data Lake, and Beyond -
2023
- 🎙️
- Using Apache Kafka and ksqlDB for Data Replication at Bolt -
2021
- 🎙️ - How Bolt Has Adopted Change Data Capture with Confluent Platform -
2020
- 📚 - Kewei Shang -
2020
- 📚
- Transactional Events Publishing At Brex -
2022
- 📚
- CERN IoT Kafka Pipelines -
2024
- 🎙️
- All about Kafka -
2024
- 🎙️ - Intelligent, automatic restarts for unhealthy Kafka consumers -
2023
- 📚 - Using Apache Kafka to process 1 trillion inter-service messages -
2022
- 📚
- Using Streams Replication Manager Prefixless Replication for Kafka Topic Aggregation -
2024
- 📚 - Streams Replication Manager Prefixless Replication -
2024
- 📚
- Kafka infrastructure renovation at Coinbase -
2022
- 📚 - How we scaled data streaming at Coinbase using AWS MSK -
2021
- 📚
- Managing Kafka and Data Streams at Criteo -
2023
- 📚 - Upgrading Kafka on a large infra, or: when moving at scale requires careful planning -
2019
- 📚 - How Criteo is managing one of the largest Kafka Infrastructure in Europe -
2019
- 📚
- Real-time Adaptive Controls for Kafka Consumers -
2024
- 🎙️
- Running Production Kafka Clusters in Kubernetes -
2019
- 🎙️
- DoorDash Empowers Engineers with Kafka Self-Serve -
2024
- 📚 - API-First Approach to Kafka Topic Creation -
2023
- 📚 - Building Scalable Real Time Event Processing with Kafka and Flink -
2020
- 📚 - Eliminating Task Processing Outages by Replacing RabbitMQ with Apache Kafka Without Downtime -
2020
- 📚
- Sink Kafka Messages to ClickHouse Using 'ClickHouse Kafka Ingestor' -
2022
- 📚 - When Kafka Went Offshore -
2021
- 📚 - Enhancing Ziggurat - The Backbone Of Gojek's Kafka Ecosystem -
2021
- 📚 - Handling Dead Letters in a Streaming System -
2020
- 📚 - How Kafka Solved a Culture Problem at Gojek -
2019
- 📚 - Fronting : An Armoured Car for Kafka Ingestion -
2018
- 📚 - Sakaar: Taking Kafka data to cloud storage at GO-JEK -
2018
- 📚
- Kafka on Kubernetes: Reloaded for fault tolerance -
2023
- 📚 - Zero trust with Kafka -
2022
- 📚 - How Kafka Connect helps move data seamlessly -
2022
- 📚 - Exposing a Kafka Cluster via a VPC Endpoint Service -
2022
- 📚 - Detect Fraud Successfully with GrabDefence! -
2021
- 🎙️ - Optimally Scaling Kafka Consumer Applications -
2020
- 📚
- ProtoMock: Simple Kafka Testing by Generating Mock Data from Protobuf Schemas -
2023
- 📚 - Renaming a Kafka topic -
2023
- 📚
- Scaling Telemetry Systems with Streaming -
2023
- 🎙️ - Lessons Learned From the Migration to Confluent Kafka -
2021
- 📚 - Scaling Kafka at Honeycomb -
2021
- 📚 - Bitten by a Kafka Bug - Postmortem -
2019
- 📚
- Evolving a Real-time Fraud Barrier with Kafka -
2024
- 🎙️
- Load-balanced Brooklin Mirror Maker: Replicating large-scale Kafka clusters at LinkedIn -
2022
- 📚 - TopicGC: How LinkedIn cleans up unused metadata for its Kafka clusters -
2022
- 📚 - How LinkedIn customizes Apache Kafka for 7 trillion messages per day -
2019
- 📚 - URP? Excuse You! The Three Metrics You Have to Know -
2018
- 🎙️ - Test Strategy for Samza/Kafka Services -
2017
- 📚 - Kafka Ecosystem at LinkedIn -
2016
- 📚 - Kafkaesque Days at LinkedIn – Part 1 -
2016
- 📚 - How We’re Improving and Advancing Kafka at LinkedIn -
2015
- 📚
- Evolution of Streaming Pipeline at Lyft -
2023
- 🎙️ - Building an Adaptive, Multi-Tenant Stream Bus with Kafka and Golang -
2020
- 📚 - Can Kafka Handle a Lyft Ride? -
2020
- 🎙️ - Operating Apache Kafka Clusters 24/7 Without A Global Ops Team -
2019
- 📚 - Bulletproof Apache Kafka® with Fault Tree Analysis -
2019
- 🎙️ - Production Ready Kafka on Kubernetes -
2019
- 🎙️
- How we built a queue on top of Kafka -
2024
- 📚
- Designing Kafka Streams Applications -
2024
- 📚 - Contributing to open source software : AKHQ -
2024
- 📚 - How to 'Kstreamplify' : your new way to develop Kafka Streams application -
2023
- 📚 - From Monolithic Orchestrator to Streaming with Microservices -
2023
- 🎙️ - Migrate Applications from Kafka On-Premise to Confluent Cloud -
2022
- 📚 - The Michelin Guide: an unexpected event driven use case -
2022
- 📚 - Moving from orchestration to choregraphy - Part 3 -
2022
- 📚 - Moving from orchestration to choregraphy - Part 2 -
2021
- 📚 - Moving from orchestration to choregraphy - Part 1 -
2021
- 📚 - “The metamorphose” of our Information System by Implementing a distributed event streaming platform -
2021
- 📚
- Self-Hosting Kafka at Scale: Netflix's Journey and Challenges -
2024
- 🎙️ - Featuring Apache Kafka in the Netflix Studio and Finance World -
2020
- 📚 - Inca — Message Tracing and Loss Detection For Streaming Data @Netflix -
2019
- 📚 - Evolution of the Netflix Data Pipeline -
2016
- 📚 - Kafka Inside Keystone Pipeline -
2016
- 📚
- Scaling Data Ingestion: Overcoming Challenges with Cell Architecture -
2024
- 🎙️ - Keep Your Kafka Cloud Costs in Check with Showbacks -
2024
- 🎙️ - Tuning Apache Kafka Consumers to maximize throughput and reduce costs -
2024
- 📚 - 20 best practices for Apache Kafka at scale -
2018
- 📚 - Using Apache Kafka for Real-Time Event Processing at New Relic -
2018
- 📚 - Best practices and strategies for Kafka topic partitioning -
2021
- 📚 - AWS re:Invent 2020: How New Relic is migrating its Apache Kafka cluster to Amazon MSK -
2021
- 🎙️ - New Relic case: "Huge scale, small clusters: Using Cells to scale in the Cloud" -
2021
- 🎙️ - Monitoring Kafka without instrumentation using eBPF -
2022
- 🎙️ - Key Metrics To Uncover the Root Cause of Kafka Performance Anomalies -
2022
- 🎙️ - Reducing Impact of Single Broker Failures in Kafka -
2023
- 🎙️ - Go Big or Go Home: Approaching Kafka Replication at Scale -
2023
- 🎙️ - Mitigating Kafka Broker ‘Gray’ Failures For Key Based Partitioners With Partition Multihoming -
2023
- 🎙️ - Monitoring Apache Kafka for cloud cost reduction -
2023
- 📚
- Scaling Kafka to Support PayPal’s Data Growth -
2023
- 📚 - Scaling Kafka Consumer for Billions of Events -
2021
- 📚 - Marching Toward a Trillion Kafka Messages per Day: Running Kafka at scale at PayPal -
2020
- 🎙️
- Pinterest Tiered Storage for Apache Kafka®️: A Broker-Decoupled Approach -
2024
- 📚 - Pinterest’s Journey to a Automated, Efficient, and Low-Maintenance Kafka Platform -
2024
- 🎙️ - Lessons Learned from Running Apache Kafka at Scale at Pinterest -
2021
- 📚 - How Pinterest runs Kafka at scale -
2018
- 📚 - Open sourcing DoctorKafka: Kafka cluster healing and workload balancing -
2017
- 📚
- How to Manage Schemas and Handle Standardization -
2023
- 📚 - How to Roll Your Kafka Cluster With Zero Downtime and No Data Loss -
2023
- 📚 - Know Your Limits: Cluster Benchmarks -
2022
- 📚 - Let’s Make Your CFO Happy; A Practical Guide for Kafka Cost Reduction -
2022
- 🎙️ - From AWS CloudFormation to Terraform: Migrating Apache Kafka -
2021
- 📚
- Robinhood’s Kafka Journey from EC2 to Kubernetes -
2024
- 🎙️ - Robinhood’s Kafkaproxy: Decoupling Kafka Consumer Logic from Application Business Logic -
2023
- 🎙️ - Tackling Kafka, with a Small Team -
2019
- 🎙️
- Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability -
2023
- 🎙️ - Capturing Every Change From Shopify’s Sharded Monolith -
2021
- 📚 - Running Apache Kafka on Kubernetes at Shopify -
2018
- 📚 - Kafka Producer Pipeline for Ruby on Rails -
2014
- 📚
- Building Self-driving Kafka clusters using open source components -
2022
- 📚 - Building Self-driving Kafka clusters using open source components -
2022
- 📚
- Mastering Kafka at Scale: Unleashing the Power of Temporal at Stripe | Replay 2023 -
2023
- 🎙️ - 6 Nines: How Stripe keeps Kafka highly-available across the globe -
2022
- 🎙️
- Protobuf Support in Uber's Real-Time Data Stack -
2024
- 🎙️ - Topic Federation: Enhance Kafka Availabilty with Sharded Topics Across Clusters -
2024
- 🎙️ - Introduction to Kafka Tiered Storage at Uber -
2024
- 📚 - Exactly-Once Stream Processing at Scale in Uber -
2024
- 🎙️ - Learnings of Running Kafka Tiered Storage at Scale -
2023
- 🎙️ - Securing Kafka® Infrastructure at Uber -
2022
- 📚 - Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot -
2021
- 📚 - Introducing uGroup: Uber’s Consumer Management Framework -
2021
- 📚 - Disaster Recovery for Multi-Region Kafka at Uber -
2020
- 📚 - Kafka Cluster Federation at Uber -
2019
- 🎙️ - Building Reliable Reprocessing and Dead Letter Queues with Apache Kafka -
2018
- 📚 - Introducing Chaperone: How Uber Engineering Audits Apache Kafka End-to-End -
2016
- 📚 - uReplicator: Uber Engineering’s Robust Apache Kafka Replicator -
2016
- 📚
- Streaming Infrastructure at Wise -
2023
- 🎙️ - Rack awareness in Kafka Streams -
2022
- 📚 - Teamwork: Implementing a Kafka retry strategy at Wise -
2021
- 📚 - Running Kafka in Kubernetes, Part 1: Why we migrated our Kafka clusters to Kubernetes. -
2021
- 📚 - Running Kafka in Kubernetes, Part 2: How we migrated our Kafka clusters to Kubernetes. -
2021
- 📚 - Securing Kafka with SPIFFE at TransferWise - Jonathan Oddy, Levani Kokhreidze -
2020
- 🎙️ - Achieving high availability with stateful Kafka Streams applications -
2018
- 📚
- 4 Steps for Kafka Rebalance - Notes From the Field -
2021
- 📚 - Wix’s Journey Into Data Streams -
2021
- 📚 - Building a High-level SDK for Kafka: Greyhound Unleashed -
2020
- 📚
- Kafka on PaaSTA: Running Kafka on Kubernetes at Yelp (Part 2 - Migration) -
2022
- 📚 - Kafka on PaaSTA: Running Kafka on Kubernetes at Yelp (Part 1 - Architecture) -
2021
- 📚 - Streams and Monk – How Yelp is Approaching Kafka in 2020 -
2020
- 📚 - Billions of Messages a Day – Yelp’s Real-time Data Pipeline -
2017
- 🎙️
- Rock Solid Kafka and ZooKeeper Ops on AWS -
2018
- 📚 - Many-to-Many Relationships Using Kafka -
2018
- 📚 - Event First Development - Moving Towards Kafka Pipeline Applications -
2017
- 📚 - Reattaching Kafka EBS in AWS -
2017
- 📚 - Real-time Ranking with Apache Kafka’s Streams API -
2017
- 📚 - Running Kafka Streams applications in AWS -
2017
- 📚 - A Recipe for Kafka Lag Monitoring -
2017
- 📚 - Surviving Data Loss -
2017
- 📚
- No Access Denied: Our Transition to Kafka ACLs -
2024
- 📚 - Seamless Transition: Migrating Kafka Cluster to Kubernetes -
2024
- 📚 - Kafka: Automating Root CA rotation with Vault -
2023
- 📚 - Implementing mTLS and Securing Apache Kafka at Zendesk -
2021
- 📚 - An investigation into Kafka Log Compaction -
2020
- 📚 - Kafka on Ruby -
2020
- 📚 - Create a test data generator using Kafka Connect -
2018
- 📚