Streamdal is an observability platform for data streams.
Streamdal offers foundational components that are necessary for building and maintaining complex distributed systems that utilize messaging systems.
- Messaging system introspection (we support nearly all, popular messaging systems)
- Automatic schema discovery (if using JSON)
- In-depth protobuf support
- Automatic event indexing, enabling granular search
- Automatic archiving of events into a "data-science optimal" parquet format
- Granular replay functionality
Take a look at the Use Cases section to get a better idea as to what's possible with the Batch platform.
The following components make up the Batch platform:
- 1.Service/utility that collects data from your message bus and relays it to our event collectors (via gRPC)
- 1.All of the events that you send us have their schemas automatically inferred in order to generate a Hive table schema and a parquet schema.
- 2.The schemas are dynamically evolving - as your events evolve, so will the schema.
- 1.We store your events forever in our search cache and in S3 in parquet format.
- 2.The data in search cache is used for search; data in S3 is used for replays.
- 3.You can provide your own S3 bucket for us to write the parquet data or utilize our S3 data lake.
- 1.If you provide your own bucket, you can utilize Athena to perform queries on the parquet dataset.
- 1.All event data is indexed and can be searched using Lucene syntax
- 2.Our search interface functions as a "looking glass" - it exists purely for you to quickly sift through a large data set.
- 3.Once you find the general type of data you are looking for - perform a replay or an extended search, which will find all of the data by going through the parquet data stored in the data lake.
- 1.Once you find the data you are looking for, you can replay that data to a destination of your choice - Kafka, Rabbit, SQS or an HTTP endpoint.