Track 4 Topic 1: Track Overview

This tutorial track builds the very first pipeline in our system. This pipeline must deal with raw, unstructured data and convert them to structured records.

The pipeline, named “payment-p1,” consumes records from topic “log1” and parses them into objects. Data values in log1 are in raw string format, which is not amenable to processing. So we create a parser to convert them to records according to a schema.

Another purpose of parsing is to iron out invalid raw data. However, we will not have a chance to see this effect in the tutorial because all values in topic log1 now fit the schema of the parser. Trying the filtering feature is an exercise for you. After going through the tutorial, you may add an invalid record to the source file and watch it appear in the error topic.

This tutorial track also demonstrates how to use Calabash GUI to design a parser. Since a schema is the centerpiece of the parser, you will also get first-hand experience using the schema editor.