Google BigQuery

Connect Google BigQuery to Formula Bot using a service account. Analyze large datasets with natural language queries and partitioned table support.

Connect your Google BigQuery data warehouse to analyze large datasets using natural language queries.

Prerequisites

  • A Google Cloud project with BigQuery enabled
  • A service account with BigQuery access
  • Service account JSON key file

Setup Steps

1. Create a Service Account

  1. Go to Google Cloud Console
  2. Navigate to IAM & Admin > Service Accounts
  3. Click Create Service Account
  4. Name it (e.g., formulabot-reader)
  5. Click Create and Continue

2. Assign Permissions

Grant the service account read access:

  1. Add the role: BigQuery Data Viewer
  2. Add the role: BigQuery Job User (required to run queries)
  3. Click Continue > Done

Only grant the minimum required permissions. BigQuery Data Viewer provides read-only access.

3. Create a Key

  1. Click on the service account you created
  2. Go to Keys tab
  3. Click Add Key > Create new key
  4. Select JSON format
  5. Download the key file (keep it secure!)

4. Connect in Formula Bot

  1. Go to Connectors > Add Connector
  2. Select BigQuery
  3. Upload your service account JSON key
  4. Select the project and dataset(s) to access
  5. Click Test Connection
  6. Save when successful

Project and Dataset Selection

After uploading your service account key:

  1. Select the Project containing your data
  2. Choose which Datasets to make available
  3. Formula Bot will scan the schemas automatically

You can grant access to multiple datasets from the same project.

Example Questions

Basic Queries

  • "How many rows are in the events table?"
  • "Show me the first 100 rows from users"
  • "What columns are in the orders dataset?"

Aggregations

  • "What are total sales by month?"
  • "Calculate the average session duration"
  • "Count unique users by country"

Time-Based Analysis

  • "Show daily active users for the past 30 days"
  • "Compare this week's revenue to last week"
  • "What's the hourly traffic pattern?"

Complex Analysis

  • "Join orders with customers and show revenue by segment"
  • "Find users who made purchases but never completed onboarding"
  • "Calculate cohort retention rates"

Working with Partitioned Tables

BigQuery tables are often partitioned by date. For best performance:

  • Specify date ranges in your questions
  • "Show orders from the last 7 days"
  • "Analyze data for January 2024"

Including date filters on partitioned tables significantly reduces query costs and improves performance.

Query Costs

BigQuery charges based on data scanned. Formula Bot:

  • Shows estimated query size before running
  • Optimizes queries to minimize data scanning
  • Respects column pruning (only scans needed columns)

Monitor costs in Google Cloud Console's BigQuery section.

Troubleshooting

Permission Denied

  • Verify service account has BigQuery Data Viewer role
  • Ensure BigQuery Job User role is assigned
  • Check the service account has access to the specific dataset

Invalid Credentials

  • Regenerate the service account key
  • Ensure you uploaded the full JSON file
  • Check the service account hasn't been disabled

Table Not Found

  • Verify the dataset is selected in connector settings
  • Check table permissions in BigQuery
  • Ensure the table exists in the selected project

Query Timeout

  • Add date filters to reduce data scanned
  • Ask for aggregated data instead of raw rows
  • Break complex queries into smaller parts

Best Practices

Use Specific Date Ranges

Always specify time periods to reduce costs:

  • "Last 30 days" instead of "all data"
  • "January 2024" instead of "this year"

Reference Tables Clearly

Include dataset names when asking about specific tables:

  • "Show data from analytics.events"
  • "Query the sales dataset"

Limit Large Results

Ask for top N results or aggregations:

  • "Top 100 customers by revenue"
  • "Total sales by category" (instead of all transactions)

Next Steps