Anatomy of a €50k/month BigQuery Bill
I regularly see BigQuery bills from growing scale-ups. The pattern is often the same: a bill that exploded without anyone really understanding why.
In 90% of the cases I audit, the bill tripled in less than a year, and the data team discovers the problem when the CFO shows up with questions.
In this article, I break down a typical €50k/month bill to show where the money goes and where the optimizations are hiding.
The typical profile
Let's imagine a classic e-commerce scale-up:
- Size: 200 employees, 50 active BigQuery users (including 8 in the core data team)
- Data team: 2 Data Engineers, 4 Analysts, 2 Data Scientists. The rest: Product, Finance, Marketing with Looker access
- Volume: 2 TB of new data per day
- Stack: BigQuery + dbt + Looker
- History: 3 years of data, ~500 TB total
The monthly bill just went from €15k to €50k in 8 months. Nobody knows exactly why.
I saw this exact situation at a client: the Data Lead was convinced it was the data volume that had exploded. In reality, 60% of the increase came from a single poorly configured Looker dashboard, refreshing every 15 minutes for 30 users.
Bill breakdown
Overview
MONTHLY TOTAL: €52,340
├── BigQuery Analysis (Queries) €35,200 67%
│ ├── On-demand queries €28,000
│ └── BI Engine €7,200
│
├── BigQuery Storage €8,500 16%
│ ├── Active storage €6,200
│ └── Long-term storage €2,300
│
├── Dataflow €4,800 9%
│
├── Cloud Storage (GCS) €2,400 5%
│
└── Other (Pub/Sub, etc.) €1,440 3%
First observation: 67% of costs come from queries. That's where you need to dig.
Note on BI Engine: That €7,200 corresponds to a 100 GB BI Engine memory reservation (~$0.0416/GB/hour × 730h × 100 GB). BI Engine accelerates Looker queries by keeping data in memory. The problem: many teams enable BI Engine "for performance" without checking if their dashboards actually benefit. In this case, only 3 out of 25 dashboards were actually using the BI Engine cache.
Compute analysis (€35k)
Who's consuming?
-- Top 10 consumers (last 30 days)
-- EUR price: $6.25/TiB × 0.92 EUR/USD ≈ €5.75/TiB
SELECT
user_email,
COUNT(*) as query_count,
SUM(total_bytes_billed) / POW(1024, 4) AS tb_scanned,
ROUND(SUM(total_bytes_billed) / POW(1024, 4) * 5.75, 2) AS cost_eur
FROM `region-eu`.INFORMATION_SCHEMA.JOBS
WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
AND job_type = 'QUERY'
GROUP BY user_email
ORDER BY cost_eur DESC
LIMIT 10;
Typical result:
| User | Queries | TB scanned | Cost |
|---|---|---|---|
| scheduler@project.iam | 45,000 | 2,100 | €12,075 |
| looker@project.iam | 128,000 | 1,450 | €8,337 |
| analyst-marie@ | 3,200 | 890 | €5,117 |
| analyst-paul@ | 2,800 | 620 | €3,565 |
| dbt-prod@ | 1,200 | 480 | €2,760 |
Insights:
- The scheduler (Airflow/dbt) represents 35% of costs → pipelines to audit
- Looker runs 128k queries → lots of cache misses, poorly optimized dashboards
- 2 analysts consume as much as the rest of the team → unoptimized exploratory queries
Which queries cost the most?
-- Top 20 most expensive queries
SELECT
job_id,
user_email,
ROUND(total_bytes_billed / POW(1024, 4) * 5.75, 2) AS cost_eur,
ROUND(total_bytes_billed / POW(1024, 3), 1) AS gb_scanned,
SUBSTR(query, 1, 200) AS query_preview
FROM `region-eu`.INFORMATION_SCHEMA.JOBS
WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
AND job_type = 'QUERY'
ORDER BY total_bytes_billed DESC
LIMIT 20;
Recurring patterns:
- SELECT * on unpartitioned tables: One query, 2 TB scanned, €12
- Dashboards without date filters: Each refresh scans the entire history
- Poorly ordered JOINs: Combinatorial explosion
- Identical recurring queries: Same query 50x/day without cache
Strong opinion: I strongly advise against on-demand mode beyond €10k/month in consumption. At that point, switch to BigQuery Editions (Standard or Enterprise). On-demand is comfortable but it's a blank check. I've seen too many teams get surprised by a 3x bill after a pipeline incident or an overly curious intern.
Storage analysis (€8.5k)
Active vs long-term breakdown
Typical problems:
| Dataset | Active | Long-term | Problem |
|---|---|---|---|
| raw_events | 180 TB | 20 TB | No lifecycle |
| staging | 45 TB | 0 TB | Temp tables never deleted |
| snapshots | 80 TB | 10 TB | Snapshots kept indefinitely |
The Active/Long-term ratio should be ~30/70, here it's the opposite.
The last time I saw this pattern, the client was paying €3,000/month for staging tables created by a former employee who left 18 months ago. Nobody dared delete them "just in case".
The 6 quick wins
1. Enable require_partition_filter (savings: ~€8k/month)
ALTER TABLE `project.dataset.events`
SET OPTIONS (require_partition_filter = TRUE);
What people often misunderstand: Many think partitioning is enough. It's not. Without require_partition_filter, your users can still scan the entire table. They will, out of laziness or by mistake. Force the constraint, or it doesn't exist.
2. Fix Looker dashboards (savings: ~€4k/month)
- Mandatory date filters
- Materialized Views for common aggregations
- BI Engine on frequent tables
3. Optimize dbt pipelines (savings: ~€3k/month)
# dbt_project.yml
models:
+partition_by:
field: event_date
data_type: date
+cluster_by: ["country", "event_type"]
4. Clean up staging (savings: ~€1.5k/month)
ALTER TABLE `project.staging.temp_table`
SET OPTIONS (expiration_timestamp = TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 7 DAY));
5. Lifecycle policy on raw_events (savings: ~€2k/month)
ALTER TABLE `project.raw.events`
SET OPTIONS (partition_expiration_days = 730);
6. Train the teams (savings: ~€3k/month)
- No SELECT *
- Always filter on the partition
- Dry run before execution
Action plan
| Week | Action | Estimated savings |
|---|---|---|
| 1 | Partition filters | €8,000 |
| 2 | Top 5 dashboards | €4,000 |
| 3 | dbt pipelines | €3,000 |
| 4 | Cleanup + lifecycle | €3,500 |
| 5 | Team training | €3,000 |
Total: ~€21,500/month = 41% reduction
Conclusion
A €50k BigQuery bill is rarely inevitable. 30-50% can be saved with:
- Enforcing best practices (partition filters)
- Optimizing heavy consumers (dashboards, pipelines)
- Cleaning up storage (lifecycle, zombie tables)
- Training teams (cost-aware culture)
The hardest part isn't technical. It's maintaining these practices over time. I've seen teams clean up once, save 40%, then fall back to the same level 6 months later because nobody was monitoring.
If this type of content helps you, you can follow me on LinkedIn or subscribe to the newsletter for future articles.