Track 3 Topic 1: Track Overview

This track first creates a Kubernetes cluster. Then we expand the Kafka system deployed in the previous tutorial track with a Kafka-Connect component. The Kafka-Connect component needs to run on the Kubernetes cluster.

The reason for adding the Kafka-Connect component is to run readers. A reader loads data from a data source in real-time and saves the data to a Kafka topic. Here “real-time” means the reader runs 24 x 7. It automatically detects new records from the source and loads them. We will design a reader that loads from text files in the cloud storage.

Calabash offers the following readers that require the Kafka-Connect:

  • Text file reader
  • Binary file reader
  • JSON file reader
  • Avro file reader
  • Parquet file reader
  • Microsoft Excel file reader
  • Google sheets reader
  • JDBC query reader

Calabash also offers a unique reader that does not require Kafka-Connect. This reader is the “API Service” reader. An API Service reader creates a secure REST API to accept POST or PUT messages. It then turns around to save data to a Kafka topic. Using the API Service reader, you can further insulate your Kafka system from the outside world. You can also simplify logging to Kafka because posting to a REST API is a lot easier than directly producing data to Kafka.

We will create an API Service reader in Track 9 of the tutorial.

In addition to readers, Kafka-Connect also supports writers. Calabash offers the following types of writers.

  • CSV file writer
  • JSON file writer
  • Avro file writer
  • Parquet file writer
  • JDBC table writer
  • Google BigQuery writer
  • API target writer

We will try the writer in Track 7 of the tutorial.