Our infrastructure is designed to be highly scalable, asynchronous, secure, and put data privacy first. We built our infrastructure from the ground up and out of many different components to create a best-in-class experience around these objectives.
Before we get into our architecture it may be helpful to go over an example client application. For this example, we will cover a small event-driven application.
Typically, event-driven systems consist of:
- a message bus
- event producers
- event consumers
The message bus layer can be anything from Kafka, Rabbit, SQS, and many more. The order, payment, and shipping applications represent event producers that generate information for consumers which are typically backend applications that execute tasks based on the generated messages.
Connecting to Batch
The replay process allows clients to use the search API to filter and replay specific events or all events they have stored in a collection.
Client destinations can range from simple HTTP endpoints to Kafka, RabbitMQ, or SQS.
Below is the full overview of our architecture. From the client-side plumber deployment all the way down to our replay service.
- The writers do a lot of heavy lifting:
- It pulls messages off our cache (Kafka)
- Generates optimally formatted parquet data
- Write the messages to search
- Writes (partitioned) parquet data to S3
- Updates our internal metrics service on the statistics about the collected messages/events
- Our S3 storage are highly optimized data lakes storing a copy of all messages and formatted as parquet files.
- SearchCache is a large cluster of servers designed to return quick results of your most recent data.
This is some of the tech that Batch uses:
We utilize event driven systems architecture (with a pinch of event sourcing) to further increase service reliability.
Finally, we use Batch for our internal message system which allows us to rebuild state if things ever go wrong.