Docs
Search…
Events

Overview

Events are messages that you send and receive to communicate state change of a component or a system.
Anything can be an event - passing a message in a JavaScript application is an event and so is passing a message via Kafka or via a UDP socket.
The most common use case for events is to pass them through a message system such as RabbitMQ, Kafka or SQS to facilitate asynchronous work.
While any piece of data can be an event, you should strive to have well-composed events. A well-composed event should:
  • Use a common event envelope​
  • Indicate the event type
  • Not use ambiguous field names
  • Not share field names
  • Be easy to parse by a computer
  • Be type safe
NOTE: Batch uses "events" and "messages" terminology interchangeably throughout the platform - they are one and the same.

Message Passing Patterns

There are many ways to facilitate message passing.
The most common way to pass messages would be to utilize a messaging system such as:
More advanced use-cases may utilize technologies such as ZeroMQ, Nanomsg, Elixir's built-in RPC or pure gRPC.

Why utilize message passing?

There are many benefits to using a message passing mechanism but the primary reason is to gain additional reliability in your distributed system.
The quickest way to demonstrate this is through an example.

Example

Traditionally, if you (application A) want another application (B) to do some work, you would have A call B via a REST API. This is easy and it works but soon you may find out that:
  • Networks are unreliable and 1% of your requests don't complete
  • Application B gets deployed way more often than you thought
  • Application B gets restarted every night due to a memory leak
  • Application B is rate-limited to 5 requests per second (while application A often hits 100+ requests per second)
This all translates to poor service reliability.

A better approach

A way to improve service reliability would be to utilize an asynchronous approach that involves message passing.
  • Application A produces a message to a message queue
  • The message contains ALL of the necessary information for another application to be able to process the request and produce a result somewhere
  • Application B eventually picks up the message, processes it and produces a result by emitting another event that application A picks up

Takeaway

By utilizing an asynchronous approach, you are:
  • Alleviating service back pressure on application B
  • Increasing successful request completion for application A
  • Changing the relationship between A and B:
    • B is no longer a hard dependency for A - it is now a soft dependency, meaning, application A can survive if B goes away
  • And.. *drumroll* .. you are beginning to use event-driven architecture​

Message Envelopes

A message envelope is a common "schema" that you use for ALL events in your system. Think "namespace".
By having a common envelope, you are less likely to run into bugs and make mistakes as you construct events that are to be processed by other systems.
Just like with messaging systems, it is up to you to choose what message envelope you wish to use - all of them have their own PROS and CONS.

JSON

By far, the most common message envelope utilizes JSON.
And for good reason:
  • It is simple
  • It is well supported in virtually all programming languages
  • And is mostly human readable
Choosing JSON for your message envelope is (probably) a good idea if:
  • You are adding asynchronous elements to an existing system
  • You are interfacing with many different programming languages
  • Establishing new standards in your org is difficult
JSONcomes with its fair share of CONS though:
  • No type safety
  • Schema-enforcement is not easy
    • This can be alleviated by using something like JSON Schema but that will require org-wide adoption for it to be effective
  • Without schema-enforcement, you are more likely to end up with events that are missing critical event data (ie. team B forgot to fill out the "payment_source" field)
  • No built-in compression; this will influence:
    • message transfer speed
    • bandwidth usage
    • storage cost
  • No automatic client or server code generation
Batch supports JSON and does automatic schema inference.

Protobuf

​Protobuf is fairly complex but offers a high-degree of confidence in your event quality due to built-in schema enforcement, type safety and excellent cross-language support.
Choosing Protobuf for your message envelope is (probably) a good idea if:
  • You are tasked with establishing a long-lasting, sophisticated and reliable event-driven architecture at your org
  • You MUST have schema-enforcement and type safety
  • You need rich type support
  • You are (or are planning on) using gRPC
  • You want to generate client/server code from schemas
  • You want to be able to point folks at good documentation
There are some significant CONS as well though:
  • You will have to create your own schema repository and setup a build process to generate your compiled protobufs
  • The CLI tools are complex
  • Deprecation is less-than-ideal
  • Protobuf messages are semi-human-readable - as in, you will not be able to view all values in a message via cat - you will need to properly decode the message
Batch has full support for protobuf - all fields in a protobuf message are indexed and available for search and replay.

Avro

Avro is the default message serialization format used by Kafka. While it is language-neutral, it is best paired with Java and Kafka.
While it is very similar to protobuf and offers many of the same advantages, it also has some unique properties:
  • Schema evolution
    • Reads and writes are tightly coupled with schemas which enables you to have granular control over how message envelopes evolve
  • Dynamic typing
    • Along with static types, it is also possible to include untyped data
Choosing Avro for your message envelope is (probably) a good idea if:
  • You are a Java and/or Kafka shop
  • You are using Confluent's platform
  • You need both static and dynamic types
CONS:
  • Avro is not as well supported as Protobuf
  • Need schema to read/write data (might be a PRO in some cases)
Batch has full support for Avro - all fields in an Avro message are indexed and available for search and replay.

Other

In most cases, we advise to use JSON if you are new to event driven systems and are not sure about all of your requirements.
If you are familiar with asynchronous patterns and are comfortable with distributed systems, choosing protobuf is a good call.

Event Examples

Good (JSON)

1
{
2
"type": "order",
3
"metadata": {
4
"id": "f969a06c-5205-4811-9b45-f1baee3ad944",
5
"origin": "billing",
6
"unix_ts": 1614794613
7
},
8
"order": {
9
"id": "8b034609-7cbe-465f-98a0-6cd1adc72dd9",
10
"account_id": "ae59b2f8-832f-4073-ab50-90735d52411d",
11
"action": "purchase",
12
"unix_ts": 1614794613,
13
"products": [
14
{
15
"id": "c5f03dec-35b0-43fc-9859-1f81a9a89b92",
16
"quantity": 1
17
},
18
{
19
"id": "95e2fb41-8f44-4c96-a74f-4e266ad19a90",
20
"quantity": 2
21
}
22
]
23
}
24
}
Copied!

Why is this a good event?

  • Clear indication of event type
  • Utilizes an event envelope that can be used to house other data
  • Low chance for top-level key collisions
  • Clear schema

Bad (JSON)

1
{
2
"id": "8b034609-7cbe-465f-98a0-6cd1adc72dd9",
3
"account_id": "ae59b2f8-832f-4073-ab50-90735d52411d",
4
"action": "purchase",
5
"unix_ts": 1614794613,
6
"products": [
7
{
8
"id": "c5f03dec-35b0-43fc-9859-1f81a9a89b92",
9
"quantity": 1
10
},
11
{
12
"id": "95e2fb41-8f44-4c96-a74f-4e266ad19a90",
13
"quantity": 2
14
}
15
]
16
}
Copied!

Why is this a bad event?

  • No clear indication of event type
  • High chance of top-level key collisions
  • Ambiguous field names such as "id" and "action"
  • Fields are re-used between different events
  • Unclear schema
Last modified 8mo ago