Google BigQuery
Connect Google BigQuery to Formula Bot using a service account. Analyze large datasets with natural language queries and partitioned table support.
Connect your Google BigQuery data warehouse to analyze large datasets using natural language queries.
Prerequisites
- A Google Cloud project with BigQuery enabled
- A service account with BigQuery access
- Service account JSON key file
Setup Steps
1. Create a Service Account
- Go to Google Cloud Console
- Navigate to IAM & Admin > Service Accounts
- Click Create Service Account
- Name it (e.g.,
formulabot-reader) - Click Create and Continue
2. Assign Permissions
Grant the service account read access:
- Add the role: BigQuery Data Viewer
- Add the role: BigQuery Job User (required to run queries)
- Click Continue > Done
Only grant the minimum required permissions. BigQuery Data Viewer provides read-only access.
3. Create a Key
- Click on the service account you created
- Go to Keys tab
- Click Add Key > Create new key
- Select JSON format
- Download the key file (keep it secure!)
4. Connect in Formula Bot
- Go to Connectors > Add Connector
- Select BigQuery
- Upload your service account JSON key
- Select the project and dataset(s) to access
- Click Test Connection
- Save when successful
Project and Dataset Selection
After uploading your service account key:
- Select the Project containing your data
- Choose which Datasets to make available
- Formula Bot will scan the schemas automatically
You can grant access to multiple datasets from the same project.
Example Questions
Basic Queries
- "How many rows are in the events table?"
- "Show me the first 100 rows from users"
- "What columns are in the orders dataset?"
Aggregations
- "What are total sales by month?"
- "Calculate the average session duration"
- "Count unique users by country"
Time-Based Analysis
- "Show daily active users for the past 30 days"
- "Compare this week's revenue to last week"
- "What's the hourly traffic pattern?"
Complex Analysis
- "Join orders with customers and show revenue by segment"
- "Find users who made purchases but never completed onboarding"
- "Calculate cohort retention rates"
Working with Partitioned Tables
BigQuery tables are often partitioned by date. For best performance:
- Specify date ranges in your questions
- "Show orders from the last 7 days"
- "Analyze data for January 2024"
Including date filters on partitioned tables significantly reduces query costs and improves performance.
Query Costs
BigQuery charges based on data scanned. Formula Bot:
- Shows estimated query size before running
- Optimizes queries to minimize data scanning
- Respects column pruning (only scans needed columns)
Monitor costs in Google Cloud Console's BigQuery section.
Troubleshooting
Permission Denied
- Verify service account has BigQuery Data Viewer role
- Ensure BigQuery Job User role is assigned
- Check the service account has access to the specific dataset
Invalid Credentials
- Regenerate the service account key
- Ensure you uploaded the full JSON file
- Check the service account hasn't been disabled
Table Not Found
- Verify the dataset is selected in connector settings
- Check table permissions in BigQuery
- Ensure the table exists in the selected project
Query Timeout
- Add date filters to reduce data scanned
- Ask for aggregated data instead of raw rows
- Break complex queries into smaller parts
Best Practices
Use Specific Date Ranges
Always specify time periods to reduce costs:
- "Last 30 days" instead of "all data"
- "January 2024" instead of "this year"
Reference Tables Clearly
Include dataset names when asking about specific tables:
- "Show data from analytics.events"
- "Query the sales dataset"
Limit Large Results
Ask for top N results or aggregations:
- "Top 100 customers by revenue"
- "Total sales by category" (instead of all transactions)