Track 3 Topic 1: Track Overview

This track first creates a Kubernetes cluster. Then we expand the Kafka system deployed in the previous tutorial track with a Kafka-Connect component. The Kafka-Connect component needs to run on the Kubernetes cluster.

The reason for adding the Kafka-Connect component is to run readers. A reader loads data from a data source in real-time and saves the data to a Kafka topic. Here “real-time” means the reader runs 24 x 7. It automatically detects new records from the source and loads them. We will design a reader that loads from text files in the cloud storage.

Calabash offers the following readers that require the Kafka-Connect:

Text file reader
Binary file reader
JSON file reader
Avro file reader
Parquet file reader
Microsoft Excel file reader
Google sheets reader
JDBC query reader

Calabash also offers a unique reader that does not require Kafka-Connect. This reader is the “API Service” reader. An API Service reader creates a secure REST API to accept POST or PUT messages. It then turns around to save data to a Kafka topic. Using the API Service reader, you can further insulate your Kafka system from the outside world. You can also simplify logging to Kafka because posting to a REST API is a lot easier than directly producing data to Kafka.

We will create an API Service reader in Track 9 of the tutorial.

In addition to readers, Kafka-Connect also supports writers. Calabash offers the following types of writers.

CSV file writer
JSON file writer
Avro file writer
Parquet file writer
JDBC table writer
Google BigQuery writer
API target writer

We will try the writer in Track 7 of the tutorial.