Apache Kafka® is an event streaming platform. What does that mean?
Kafka combines three key capabilities so you can implement your use cases for event streaming end-to-end with a single battle-tested solution:
- To publish (write) and subscribe to (read) streams of events, including continuous import/export of your data from other systems.
- To store streams of events durably and reliably for as long as you want.
- To process streams of events as they occur or retrospectively.
How does it work?
- Kafka is a distributed system consisting of servers and clients communicating via a high-performance TCP network protocol. It can be deployed on bare-metal hardware, virtual machines, and containers on-premise as well as in cloud environments.
- Servers:
- Kafka is run as a cluster of one or more servers that can span multiple data centers or cloud regions. Some of these servers form the storage layer, called the brokers.
- Brokers are the servers that form the storage layer.
- Other servers run Kafka Connect to continuously import and export data as event streams to integrate Kafka with your existing systems such as relational databases as well as other Kafka clusters.
- To let you implement mission-critical use cases, a Kafka cluster is highly scalable and fault-tolerant: if any of its servers fails, the other servers will take over their work to ensure continuous operations without any data loss.
- Clients:
- They allow you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner even in the case of network problems or machine failures.
Main concepts and terminology:
- Producers are those client applications that publish (write) events to Kafka
- consumers are those that subscribe to (read and process) these events. In Kafka,
- producers and consumers are fully decoupled and agnostic of each other,
- Topics
