what’s the purpose of using a serializer before pushing data into a Kafka topic?




What does Flink provide that the Kafka consumer API doesn’t have:

  1. Job Recovery/serializing/checkpointing
  2. Late Arriving data
  3. Windowing/group by/ Watermarking
  4. Scalability/parallelism


Why Watermarking?

  1. Late event handling: In real-world systems (e.g., Kafka, IoT devices, logs), events often arrive late due to network delays, processing bottlenecks, or retries. Watermarks define how much lateness is acceptable.