Schemas
Schemas define what kind of data a collection represents and how it is stored in your collection.
Batch currently supports three types of schemas:
JSON
JSON schemas require that the data payloads sent to your collection be valid JSON. We perform automatic schema inference on your payload, allowing you to add/exclude fields from your payloads as needed. No need to manually define or update your schema!
Advantages:
Fields can be added simply by including them in your payload. The schema will be inferred and updated automatically with no need for manual work on your part
Ideal for structured or semi-structured data
Allows querying of individual fields
Fields can be removed from your payloads, unlike protocol buffers
Disadvantages:
Type cannot be changed on a field once it has been observed
The type of a field (string, number, object, array, bool) must remain the same once the field is present in your collection
Protocol Buffers
Protocol buffer schemas allow you to send binary protobuf messages directly to your collection. Batch will decode the messages using the protobuf definitions you uploaded when creating the schema. There is no need to decode/transform the messages on your end before sending it to us!
There are two methods for uploading your protobuf definitions to Batch:
Upload a zip archive of your .proto files (Not recommended) This method is not as reliable as uploading a file descriptor set because it assumes your directory structure matches your include paths perfectly.
Upload file descriptor set (Preferred) This is the preferred method as it avoids any issues with include paths and ensures we can always process your protobuf definitions 100% of the time. To generate a
.fds
descriptor set file, you will need to add the following flags to yourprotoc
call:
You then upload the resulting protos.fds
file when creating your schema in the Batch Console and you're all set!
You can find an example here in our Makefile.
Advantages:
Ideal for structured data
Allows querying of individual fields
Disadvantages:
Any updates to your protobuf definitions require you to re-upload them to Batch before we can accept messages containing new fields
Plain
Plain schemas are a catch-all for unstructured data. We do not perform schema inference on the contents your data. The contents of your data is not indexed, so fields within the data are not queryable, only the entire payload as a whole. For more structured data, we recommend using a JSON schema
Advantages:
Your data can be in any format
Disadvantages:
Data within the payloads cannot be queried
Schema Inspection
You can inspect the schema Batch has inferred in console.streamdal.com.
Navigate to 'collections' section of the dashboard
Click the collection you wish to inspect
Select the sprocket at the top right
4. Scroll down until you reach the 'Schema' section
Last updated