Batch offers foundational components that are necessary for building and maintaining complex distributed systems that utilize messaging systems.
Messaging system introspection (we support nearly all, popular messaging systems)
Automatic schema discovery (if using JSON)
In-depth protobuf support
Automatic event indexing, enabling granular search
Automatic archiving of events into a "data-science optimal" parquet format
Granular replay functionality
Support for nearly all messaging system tech
Take a look at the Use Cases section to get a better idea as to what's possible with the Batch platform.
Batch is an excellent addition to your infra if you make use of the actor model.
The following components make up the Batch platform:
Service/utility that collects data from your message bus and relays it to our event collectors (via gRPC)
All of the events that you send us have their schemas automatically inferred in order to generate a Hive table schema and a parquet schema.
The schemas are dynamically evolving - as your events evolve, so will the schema.
We store your events forever in our search cache and in S3 in parquet format.
The data in search cache is used for search; data in S3 is used for replays.
You can provide your own S3 bucket for us to write the parquet data or utilize our S3 data lake.
If you provide your own bucket, you can utilize Athena to perform queries on the parquet dataset.
All event data is indexed and can be searched using Lucene syntax
Our search interface functions as a "looking glass" - it exists purely for you to quickly sift through a large data set.
Once you find the general type of data you are looking for - perform a replay or an extended search, which will find all of the data by going through the parquet data stored in the data lake.
Once you find the data you are looking for, you can replay that data to a destination of your choice - Kafka, Rabbit, SQS or an HTTP endpoint.
For additional insight into how Batch works, check out the Architecture doc.