Retour au blog
Snowflake

The 5 Most Expensive Snowflake Mistakes

Classic Snowflake waste patterns I find in almost every environment. Each can represent thousands of euros per month.

10 février 202612 min de lecture

The 5 Most Expensive Snowflake Mistakes

After analyzing many Snowflake environments, I consistently find the same mistakes. Each can represent 10-30% of the monthly bill.

In 90% of environments I audit, at least 3 of these 5 mistakes are present. It's not a question of team competence. It's just that Snowflake is configured by default for performance, not for cost.

Here are the 5 most common waste patterns, and how to fix them.


Mistake #1: Auto-suspend disabled or too long

The problem

By default, Snowflake creates warehouses with a 5-minute auto-suspend. Many admins disable it "to avoid startup times" or increase it to 10-15 minutes.

Result: The warehouse runs idle for hours.

I saw at one client an XLARGE warehouse with auto-suspend disabled "because users complained about the 2-second cold start". Cost of that decision: €18,000/month of pure waste. To avoid 2 seconds of waiting.

The calculation

A MEDIUM warehouse (4 credits/hour) with auto-suspend disabled:

Actual usage: 2h/day of queries
Time billed: 24h/day (never suspended)
Waste: 22h × 4 credits × $3 = $264/day = $7,920/month

The solution

-- Check all warehouses
SELECT
  warehouse_name,
  warehouse_size,
  auto_suspend,
  CASE
    WHEN auto_suspend IS NULL THEN 'NEVER'
    WHEN auto_suspend > 300 THEN 'TOO LONG'
    ELSE 'OK'
  END as status
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSES
WHERE deleted IS NULL;

-- Fix
ALTER WAREHOUSE my_warehouse SET AUTO_SUSPEND = 60;

Rule: 60 seconds for interactive workloads, 120 seconds maximum for batch.

My strong opinion: I NEVER recommend disabling auto-suspend, even on "critical" warehouses. Snowflake's cold start is 1-3 seconds. If your users can't wait 2 seconds, the problem isn't technical. It's a communication problem.

Typical savings: 20-40% of compute costs


Mistake #2: Oversized warehouses

The problem

"We took a LARGE to be sure we had enough power."

Except most queries don't need a LARGE. An X-SMALL or SMALL is enough for 80% of use cases.

The worst situation I've seen: a BI team with 15 analysts, each with their own dedicated MEDIUM warehouse. Average usage: 45 minutes per day. Monthly cost: €12,000. After consolidation to a single SMALL warehouse with auto-scaling: €800/month. Same perceived performance for users.

How to detect it

-- Analyze actual load per warehouse
SELECT
  warehouse_name,
  warehouse_size,
  AVG(avg_running) as avg_concurrent_queries,
  AVG(avg_queued_load) as avg_queue,
  MAX(avg_queued_load) as max_queue
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_LOAD_HISTORY
WHERE start_time > DATEADD(day, -30, CURRENT_TIMESTAMP())
GROUP BY 1, 2
ORDER BY avg_concurrent_queries DESC;

Signs of oversizing:

  • avg_concurrent_queries < 1 → The warehouse is often idle
  • avg_queue = 0 → No contention, no need for more power
  • Simple queries (< 10 sec) on a LARGE warehouse

The solution

-- Reduce size
ALTER WAREHOUSE analytics_wh SET WAREHOUSE_SIZE = 'SMALL';

-- Or use auto-scaling
ALTER WAREHOUSE analytics_wh SET
  WAREHOUSE_SIZE = 'SMALL'
  MIN_CLUSTER_COUNT = 1
  MAX_CLUSTER_COUNT = 3
  SCALING_POLICY = 'STANDARD';

Rule: Start small, scale up if queue > 0 consistently.

Typical savings: 50-75% on affected warehouses


Mistake #3: Excessive Time Travel

The problem

Snowflake keeps by default:

  • Standard: 1 day of Time Travel
  • Enterprise: up to 90 days of Time Travel

Many companies leave the Enterprise default (90 days) on ALL tables, including:

  • Temporary staging tables
  • Log tables
  • Tables recreated daily

The calculation

A 100 GB table modified daily with 90 days of Time Travel:

Time Travel storage: 100 GB × 90 versions = 9 TB
Cost: 9 TB × $23/TB/month = $207/month for ONE table

I saw at an e-commerce client their click tracking table with 90 days of Time Travel. This table was truncated and reloaded every day. Result: 2.3 TB of Time Travel storage for a table with 25 GB of active data. Auditing their entire environment, I found 47 tables with the same problem. Total cost: €8,400/month of useless Time Travel.

How to detect it

-- Tables with excessive Time Travel storage
SELECT
  table_catalog,
  table_schema,
  table_name,
  ROUND(active_bytes / POW(1024, 3), 2) as active_gb,
  ROUND(time_travel_bytes / POW(1024, 3), 2) as time_travel_gb,
  ROUND(time_travel_bytes / NULLIF(active_bytes, 0), 1) as ratio
FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS
WHERE time_travel_bytes > active_bytes
ORDER BY time_travel_bytes DESC
LIMIT 20;

Red flag: ratio > 2 = Time Travel costs more than active data.

The solution

-- Reduce Time Travel on non-critical tables
ALTER TABLE staging.temp_data SET DATA_RETENTION_TIME_IN_DAYS = 1;

-- Temporary tables: 0 days
ALTER TABLE staging.daily_load SET DATA_RETENTION_TIME_IN_DAYS = 0;

-- Use TRANSIENT tables for staging
CREATE TRANSIENT TABLE staging.temp_data (...);

Typical savings: 15-30% of storage costs


Mistake #4: Queries without clustering on large tables

The problem

A 500 GB table without a clustering key. Every filtered query potentially scans the entire table.

Snowflake doesn't charge per TB scanned like BigQuery, but:

  • More data scanned = longer execution time
  • Longer time = more credits consumed

What teams often misunderstand: They think clustering is a "nice to have" optimization. In reality, on a table over 50 GB that's regularly filtered, NOT having a clustering key is a design error. Period.

How to detect it

-- Large tables without clustering
SELECT
  table_catalog,
  table_schema,
  table_name,
  ROUND(bytes / POW(1024, 3), 2) as size_gb,
  clustering_key
FROM SNOWFLAKE.ACCOUNT_USAGE.TABLES
WHERE bytes > 10 * POW(1024, 3)  -- > 10 GB
  AND clustering_key IS NULL
  AND deleted IS NULL
ORDER BY bytes DESC;
-- Long queries on these tables
SELECT
  query_id,
  query_text,
  total_elapsed_time / 1000 as seconds,
  bytes_scanned / POW(1024, 3) as gb_scanned
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time > DATEADD(day, -7, CURRENT_TIMESTAMP())
  AND total_elapsed_time > 60000  -- > 1 minute
ORDER BY total_elapsed_time DESC
LIMIT 20;

The solution

-- Add a clustering key based on frequent filters
ALTER TABLE events CLUSTER BY (event_date, country);

-- Check effectiveness
SELECT SYSTEM$CLUSTERING_INFORMATION('events');

Rule: Cluster by the most filtered columns, in order of selectivity.

Typical savings: 30-60% of execution time on affected queries


Mistake #5: No Resource Monitors

The problem

Without resource monitors, nothing prevents:

  • A runaway query from consuming 1000 credits
  • A forgotten warehouse from running all weekend
  • A user from launching massive queries by mistake

I've seen bills double over a weekend because of a misconfigured job. Literally: €45,000 gone between Friday evening and Monday morning because an infinite loop was relaunching the same ETL pipeline every 30 seconds.

The solution

-- Account-level monitor (safety net)
CREATE RESOURCE MONITOR account_safety
  WITH CREDIT_QUOTA = 5000
  FREQUENCY = MONTHLY
  START_TIMESTAMP = IMMEDIATELY
  TRIGGERS
    ON 75 PERCENT DO NOTIFY
    ON 90 PERCENT DO NOTIFY
    ON 100 PERCENT DO SUSPEND;

ALTER ACCOUNT SET RESOURCE_MONITOR = account_safety;

-- Monitor per critical warehouse
CREATE RESOURCE MONITOR prod_etl_monitor
  WITH CREDIT_QUOTA = 500
  FREQUENCY = MONTHLY
  TRIGGERS
    ON 80 PERCENT DO NOTIFY
    ON 100 PERCENT DO SUSPEND;

ALTER WAREHOUSE prod_etl_wh SET RESOURCE_MONITOR = prod_etl_monitor;

Recommended levels

LevelQuotaAction
75%-Notification to FinOps team
90%-Notification + investigation
100%-Suspend (except critical prod)

Typical savings: Protection against incidents (potentially thousands of €)


Quick diagnostic checklist

-- Complete diagnostic script
-- Run to quickly identify problems

-- 1. Misconfigured warehouses
SELECT warehouse_name, warehouse_size, auto_suspend,
  CASE WHEN auto_suspend IS NULL OR auto_suspend > 300 THEN 'KO' ELSE 'OK' END as status
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSES WHERE deleted IS NULL;

-- 2. Top consumers (last 7 days)
SELECT warehouse_name, SUM(credits_used) as credits
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE start_time > DATEADD(day, -7, CURRENT_TIMESTAMP())
GROUP BY 1 ORDER BY 2 DESC LIMIT 10;

-- 3. Excessive Time Travel
SELECT table_name,
  ROUND(time_travel_bytes / POW(1024, 3), 1) as tt_gb,
  ROUND(active_bytes / POW(1024, 3), 1) as active_gb
FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS
WHERE time_travel_bytes > active_bytes
ORDER BY time_travel_bytes DESC LIMIT 10;

-- 4. Existing resource monitors
SELECT name, credit_quota, frequency, suspend_at, suspend_immediate_at
FROM SNOWFLAKE.ACCOUNT_USAGE.RESOURCE_MONITORS;

Honorable mentions

These mistakes didn't make the top 5, but I see them regularly:

  • Single warehouse shared between dev/prod: Dev queries block prod pipelines. A dedicated warehouse per environment costs less than a production incident.

  • Multi-cluster disabled on BI warehouse: 10 analysts waiting in queue on a single-cluster warehouse. Auto-scaling would have cost less than the lost time.

  • COPY INTO without purge: Files remain in the stage after loading. I've seen 12 TB of "temporary" CSV files accumulated over 18 months.

  • Production clones without thinking: A clone costs nothing at first, but as soon as data diverges, storage explodes. I've seen test environments cost more than prod.

  • Query acceleration enabled everywhere: The feature is billed per use. On already fast queries, it's pure waste.


Summary

MistakeTypical impactFixDifficulty
Auto-suspend20-40% compute1 SQL commandEasy
Oversizing50-75% on WHAnalysis + resizeMedium
Time Travel15-30% storagePer-table policyMedium
No clustering30-60% on queriesALTER CLUSTER BYMedium
No monitorsUnlimited riskCREATE MONITOREasy

Total potential: 30-60% savings on the overall bill.


Conclusion

These 5 mistakes are present in almost every Snowflake environment I audit. The good news: they're all fixable in a few hours.

Auto-suspend at 60 seconds and resource monitors are the most impactful quick wins. Start there. You'll see results on your next bill.

If this type of content helps you, you can follow me on LinkedIn or subscribe to the newsletter for future articles.


Questions about your Snowflake environment? Feel free to reach out directly. I respond to all messages.

Jonathan Kini

Jonathan Kini

J'aide les équipes data à réduire et maîtriser leurs coûts BigQuery et Snowflake, sans sacrifier la performance. De la startup aux environnements data à grande échelle.