Overview

Streamdal is an observability platform for data streams.

! NOTICE !

docs.batch.sh / docs.streamdal.com are in the process of being converted to a new documentation platform.

Streamdal offers foundational components that are necessary for building and maintaining complex distributed systems that utilize messaging systems.

We offer:

  • Messaging system introspection (we support nearly all, popular messaging systems)

  • Automatic schema discovery (if using JSON)

  • In-depth protobuf support

  • Automatic event indexing, enabling granular search

  • Automatic archiving of events into a "data-science optimal" parquet format

  • Granular replay functionality

  • Support for nearly all messaging system tech

Take a look at the Use Cases section to get a better idea as to what's possible with the Batch platform.

Streamdal is an excellent addition to your infra if you make use of the actor model.

Demo Video

Components

The following components make up the Batch platform:

  1. Event collectors

    1. Services that receive your event data either via our gRPC or HTTP API

  2. Message bus relayers

    1. Service/utility that collects data from your message bus and relays it to our event collectors (via gRPC)

  3. Schema inference

    1. All of the events that you send us have their schemas automatically inferred in order to generate a Hive table schema and a parquet schema.

    2. The schemas are dynamically evolving - as your events evolve, so will the schema.

  4. Storage

    1. We store your events forever in our search cache and in S3 in parquet format.

    2. The data in search cache is used for search; data in S3 is used for replays.

    3. You can provide your own S3 bucket for us to write the parquet data or utilize our S3 data lake.

      1. If you provide your own bucket, you can utilize Athena to perform queries on the parquet dataset.

  5. Search

    1. All event data is indexed and can be searched using Lucene syntax

    2. Our search interface functions as a "looking glass" - it exists purely for you to quickly sift through a large data set.

    3. Once you find the general type of data you are looking for - perform a replay or an extended search, which will find all of the data by going through the parquet data stored in the data lake.

  6. Replay

    1. Once you find the data you are looking for, you can replay that data to a destination of your choice - Kafka, Rabbit, SQS or an HTTP endpoint.

For additional insight into how Batch works, check out the Architecture doc.

Last updated