Ingesting CaliberMind Data Into Snowflake

Nolan Garrido Updated by Nolan Garrido

CaliberMind natively is stored in Google BigQuery, but can be transferred to your Snowflake instance in several different ways. Each of these options may have cost implications with your CaliberMind contract or costs from other vendors, so please check with your Sales Rep or account manager if you have any questions.

Snowflake Logo

Snowflake is able to use Google Cloud Storage buckets as an external stage, regardless of which cloud provider your Snowflake instance is hosted on. This means that you can access data stored in a Google Cloud Storage bucket even if your Snowflake instance is hosted on AWS or Azure. Snowflake has detailed step-by-step instructions in their help docs that can help explain the overall process in more detail.

This process can only be completed by a user who has been granted the ACCOUNTADMIN role in your organization.

  1. Set up CaliberMind elements in Snowflake: We strongly recommend creating a unique role, schema and warehouse for CaliberMind. Depending on your organization's needs and security requirements, you may want to create a standalone database as well.
    The code block below will help you set up a new role, new user, new warehouse, and new database for CaliberMind. It will grant usage permissions on the objects it creates which will allow this CaliberMind role to operate in your instance without having access to anything else in your system.
    You must change the password before you execute the code block. We would recommend using a password generator like Lastpast.com to create something that is at least 32 characters long.

    begin;

    -- create variables for user / password / role / warehouse / database (needs to be uppercase for objects)
    set role_name = 'CALIBERMIND_ROLE';
    set user_name = 'CALIBERMIND_USER';
    set user_password = '[Enter very secure password here]';
    set warehouse_name = 'CALIBERMIND_WAREHOUSE';
    set database_name = 'CALIBERMIND_DATABASE'; /* This is optional */

    -- change role to securityadmin for user / role steps
    use role securityadmin;

    -- create role for CaliberMind
    create role if not exists identifier($role_name);
    grant role identifier($role_name) to role SYSADMIN;

    -- create a user for Calibermind
    create user if not exists identifier($user_name)
    password = $user_password
    default_role = $role_name
    default_warehouse = $warehouse_name;

    grant role identifier($role_name) to user identifier($user_name);

    -- change role to sysadmin for warehouse / database steps
    use role sysadmin;

    -- create a warehouse for CaliberMind
    create warehouse if not exists identifier($warehouse_name)
    warehouse_size = xsmall
    warehouse_type = standard
    auto_suspend = 10
    auto_resume = true
    initially_suspended = true;

    -- create database for Calibermind
    create database if not exists identifier($database_name);

    -- grant Calibermind role access to warehouse
    grant USAGE
    on warehouse identifier($warehouse_name)
    to role identifier($role_name);

    -- grant Calibermind access to database
    grant CREATE SCHEMA, MONITOR, USAGE
    on database identifier($database_name)
    to role identifier($role_name);
    grant create stage on schema public to role ($role_name);

    commit;
  2. Create a Storage Integration: You will need to update the below example based on your specific CaliberMind account setup. In order to complete this step, CaliberMind will need to first create your GCS bucket and will provide you with their URI. Enter this location in the storage_allowed_locations field. If you do not know this value, please reach out to your account manager and they will provide it to you.
    use role ACCOUNTADMIN;

    create storage integration calibermind_int
    type = external_stage
    storage_provider = gcs
    enabled = true
    storage_allowed_locations = ('gcs://[calibermind bucket]/[path to data]/');

    grant usage on integration calibermind_int to role CALIBERMIND_ROLE;
  3. Creating the integration will create a Google service account from inside Snowflake. You, the client, will own this account and it is connected to your Snowflake instance. Since Caliber Mind does not have access to this account, so you will need to provide Caliber Mind with the name of the service account that Snowflake has generated.
    To find the name of the service account, you need to describe the storage integration we created in the previous step.
    desc storage integration calibermind_int;

    Your results window should look something like this below. Please copy the property_value from the STORAGE_GCP_SERVICE_ACCOUNT row to your account manager.
    +-----------------------------+---------------+-----------------------------------------------------------------------------+
    | property | property_type | property_value |
    +-----------------------------+---------------+-----------------------------------------------------------------------------+
    | ENABLED | Boolean | true |
    | STORAGE_ALLOWED_LOCATIONS | List | gcs://[calibermind project]/[data folder] |
    | STORAGE_GCP_SERVICE_ACCOUNT | String | service-account-id@[calibermind_project].iam.gserviceaccount.com |
    +-----------------------------+---------------+-----------------------------------------------------------------------------+
  4. CaliberMind will give the service account you provided access to your organization's Cloud storage buckets. Once that account has permissions, you will want to set up a FILE_FORMAT so Snowflake knows how to interpret the file it is receiving and a STAGE so Snowflake can represent the data as a table inside Snowflake for the next step.
    create or replace file format calibermind_csv_gzip  
    type = csv
    field_delimiter = ','
    skip_header = 1
    null_if = ('NULL', 'null','NaN')
    empty_field_as_null = true
    compression = gzip;

    use schema calibermind_database.public;

    create stage calibermind_gcp_stage
    url = 'gcs://[calibermind project]/[data folder]'
    storage_integration = calibermind_int;
    file_format = calibermind_csv_gzip;
  5. Your connection between Snowflake and your CaliberMind Google Storage should be all set up at this point. The final step is something you will need to repeat to refresh your data. We recommend scheduling a job similar to this nightly to do a full table refresh. Please note, at this time, CaliberMind does not support incremental data loads and instead creates a completely new replacement each night.
    set table_name = cm_attribution;

    alter warehouse calibermind_warehouse resume;

    truncate table @table_name;

    copy into @table_name
    from @calibermind_gcp_stage\[bucket_name]\[table_folder]
    pattern='*.gzip';

    alter warehouse calibermind_warehouse suspend;

And that's it! You should now see CaliberMind data inside of your Snowflake instance. Keep in mind the COPY INTO command only works for a single table, so you may need to specify which folder/table you are looking for or modify the pattern variable to tell Snowflake which specific files to choose. CaliberMind will make your entire data instance available in this format, so if you do not specific which specific file you are interested in copying, you can generate errors or ingest corrupted data files. If you have any additional questions on this process, please reach out to your account manager or create a support ticket by sending an email to help@calibermind.com.

OPTION 2 - Google Cloud Storage to Snowflake via Customer's Internal ETL Processes

  1. Each night, the CaliberMind system will export all datasets into a an encrypted Google Cloud Storage location.
  2. Using your ETL service (Fivetran, Alteryx, Snaplogic, Mulesoft, etc), set up a New Connector for Google Cloud Storage.
  3. Follow the steps provided by your vendor to complete the connection. You should make one new connector per CaliberMind table being ingested. If you are unsure which tables your organization will need, please schedule a meeting with our support team to review your options.

How did we do?

CaliberMind Data Export to Google Cloud Storage

Klaviyo connector [Via Fivetran]

Contact