Comment on page
All data that is ingested by the Batch collectors is written as parquet files into an AWS S3 bucket and exposed via AWS Athena.
Customers who bring their own S3 bucket are able to get direct access to the parquet data which can then be used for powering out-of-band data-science tasks.
Each collection in Batch is exposed in AWS Athena as a table. The table is fully managed and kept up to date by Batch as the schema in your collection evolves.
To get access to Athena - please send us an email that includes your AWS account ID and our support representatives will enable access.
Snowflake (and most other data warehousing platforms) support AWS S3 and the parquet format out of the box which makes integration simple and quick.
Your data team has the following
- Use Batch's parquet data as an external table
- Perform a traditional, periodic load (
COPY INTO) of the parquet data into a Snowflake table
In order to gain access to the parquet data in S3, you will need to provide your AWS key + secret via Snowflake and the full S3 path to the parquet data. By including a trailing
/in the S3 ARN, Snowflake will recursively walk all of the "directories" and search for parquet data.
1. Create a schema
2. Create a parquet format
3. Create a stage
create or replace stage onefilestage
s3://batchsh-datalakes/$team_id/$account_id/$datalake_id/$collection_id/year=2021/month=11/day=23/1637629076391797918.parquet' CREDENTIALS=(aws_key_id='***' aws_secret_key='***')
file_format = parquet;
4. Create a table (that points to the stage & format)
create table mytable
using template (
5. Load the data into the table
copy into MYTABLE
$1:raw_json::varchar, $1:batch::variant, $1:client::variant