Schemas

Schemas define what kind of data a collection represents and how it is stored in your collection.

Batch currently supports three types of schemas:

JSON

JSON schemas require that the data payloads sent to your collection be valid JSON. We perform automatic schema inference on your payload, allowing you to add/exclude fields from your payloads as needed. No need to manually define or update your schema!

Advantages:

  • Fields can be added simply by including them in your payload. The schema will be inferred and updated automatically with no need for manual work on your part

  • Ideal for structured or semi-structured data

  • Allows querying of individual fields

  • Fields can be removed from your payloads, unlike protocol buffers

Disadvantages:

  • Type cannot be changed on a field once it has been observed

  • The type of a field (string, number, object, array, bool) must remain the same once the field is present in your collection

    • // EXAMPLE
      
      // Event 1 - GOOD
      {
          "foo": "string1",
          "baz": [1, 2, 3]
      }
      
      // A subsequent event CAN omit "baz"
      
      // Event 2 - GOOD
      {
          "foo": "boop"
      }
      
      // A subsequent event CANNOT modify the type for "baz"
      
      // Event 3 - BAD
      {
          "foo": "beep",
          "baz": "now it's a string"
      }

Protocol Buffers

Protocol buffer schemas allow you to send binary protobuf messages directly to your collection. Batch will decode the messages using the protobuf definitions you uploaded when creating the schema. There is no need to decode/transform the messages on your end before sending it to us!

There are two methods for uploading your protobuf definitions to Batch:

  1. Upload a zip archive of your .proto files (Not recommended) This method is not as reliable as uploading a file descriptor set because it assumes your directory structure matches your include paths perfectly.

  2. Upload file descriptor set (Preferred) This is the preferred method as it avoids any issues with include paths and ensures we can always process your protobuf definitions 100% of the time. To generate a .fds descriptor set file, you will need to add the following flags to your protoc call:

--include_imports
--include_source_info
-o ./protos.fds

You then upload the resulting protos.fds file when creating your schema in the Batch Console and you're all set!

You can find an example here in our Makefile.

Advantages:

  • Ideal for structured data

  • Allows querying of individual fields

Disadvantages:

  • Any updates to your protobuf definitions require you to re-upload them to Batch before we can accept messages containing new fields

Plain

Plain schemas are a catch-all for unstructured data. We do not perform schema inference on the contents your data. The contents of your data is not indexed, so fields within the data are not queryable, only the entire payload as a whole. For more structured data, we recommend using a JSON schema

Advantages:

  • Your data can be in any format

Disadvantages:

  • Data within the payloads cannot be queried

Schema Inspection

You can inspect the schema Batch has inferred in console.streamdal.com.

  1. Navigate to 'collections' section of the dashboard

  2. Click the collection you wish to inspect

  3. Select the sprocket at the top right

4. Scroll down until you reach the 'Schema' section

Last updated