This track first creates a Kubernetes cluster. Then we expand the Kafka system deployed in the previous tutorial track with a Kafka-Connect component. The Kafka-Connect component needs to run on the Kubernetes cluster.
The reason for adding the Kafka-Connect component is to run readers. A reader loads data from a data source in real-time and saves the data to a Kafka topic. Here “real-time” means the reader runs 24 x 7. It automatically detects new records from the source and loads them. We will design a reader that loads from text files in the cloud storage.
Calabash offers the following readers that require the Kafka-Connect:
- Text file reader
- Binary file reader
- JSON file reader
- Avro file reader
- Parquet file reader
- Microsoft Excel file reader
- Google sheets reader
- JDBC query reader
Calabash also offers a unique reader that does not require Kafka-Connect. This reader is the “API Service” reader. An API Service reader creates a secure REST API to accept POST or PUT messages. It then turns around to save data to a Kafka topic. Using the API Service reader, you can further insulate your Kafka system from the outside world. You can also simplify logging to Kafka because posting to a REST API is a lot easier than directly producing data to Kafka.
We will create an API Service reader in Track 9 of the tutorial.
In addition to readers, Kafka-Connect also supports writers. Calabash offers the following types of writers.
- CSV file writer
- JSON file writer
- Avro file writer
- Parquet file writer
- JDBC table writer
- Google BigQuery writer
- API target writer
We will try the writer in Track 7 of the tutorial.