What I watch in Snowflake before compute cost becomes a surprise
I investigate Snowflake compute spikes in a fixed order: warehouse metering, idle gap, load, query history, later attribution, and pruning evidence before I start tuning.
Snowflake compute gets expensive fast when one warehouse bill jumps and nobody can say whether the extra credits came from idle minutes, warehouse pressure, or one recurring query family.
That is when bad triage starts. Someone reaches for a bigger warehouse. Someone else opens one loud query profile. The basic question is still unsettled.
I do not start with resizing or rewriting. I start with one warehouse and one question at a time.
My first pass stays fixed: metering, idle gap, load, query history, lagged attribution, then pruning if scans still look wrong.
Problem
The expensive mistake is not only a higher warehouse bill.
It is asking the wrong question first.
I see teams jump from “this warehouse cost more” to “rewrite that query” before anyone checks whether the warehouse was mostly busy, mostly idle, or stuck resuming.
That is how one noisy morning turns into a week of unfocused tuning.
Per-query attribution creates a second trap. As of 2026-02-10, Snowflake documents that QUERY_ATTRIBUTION_HISTORY can lag by up to eight hours, excludes warehouse idle time, and omits very short queries (<= ~100ms), so I treat it as a later confirmation layer instead of the first read (Snowflake docs).
If I wait for it before same-day triage, I lose the fast path. If I treat it like the whole bill, I blame one query family for cost it did not fully own.
This is narrower than Five pipeline observability signals before more orchestration.
Here I am not asking whether the pipeline is healthy. I am asking why one warehouse got expensive and which layer earns the next ten minutes.
Default approach
- Start with
WAREHOUSE_METERING_HISTORY. Confirm which warehouse changed and whether the jump is new, recurring, or already normalizing. - Compare
credits_used_computewithcredits_attributed_compute_queriesbefore blaming one query family. The gap is where warehouse idle time starts to show up. - Read
WAREHOUSE_LOAD_HISTORYnext. I want to know whether the warehouse was busy, overloaded, queued during provisioning, or blocked. - Use
QUERY_HISTORYfor the same-day drill-down. That is where I inspect queue time, spill, scan volume, cache use, and partitions scanned. - Return to
QUERY_ATTRIBUTION_HISTORYlater. I use it to confirm which query family actually consumed the compute credits once the lagged data lands. - Use
TABLE_QUERY_PRUNING_HISTORYonly if the expensive pattern still points to unnecessary scans. That answers a different question from metering or attribution.
All five views live in Account Usage. As of 2026-02-10, Snowflake lists WAREHOUSE_METERING_HISTORY, WAREHOUSE_LOAD_HISTORY, and TABLE_QUERY_PRUNING_HISTORY under USAGE_VIEWER, QUERY_HISTORY under GOVERNANCE_VIEWER, and QUERY_ATTRIBUTION_HISTORY under either role in Account Usage. The order still matters more than memorizing every column.
If the question shifts from “why did this warehouse get expensive?” to “what cloud services actually billed?”, I step out to METERING_DAILY_HISTORY.
The warehouse metering views tell me consumed credits first. That is the right triage start, but it is not the whole billed-cloud-services answer.
Example
Imagine a daily transform warehouse named TRANSFORM_DAILY_WH.
A model fan-out ships in the same morning release. The warehouse that usually burns a steady amount of compute between 05:45 ET and 07:00 ET suddenly costs more than twice its normal run.
I do not want a dashboard first.
I want one investigation note I can scan in order:
| Field | Value |
|---|---|
| Warehouse | TRANSFORM_DAILY_WH |
| Window | 2026-02-10 05:45-07:00 ET |
| 1. WAREHOUSE_METERING_HISTORY | credits_used_compute: 18.4 credits_attributed_compute_queries: 10.9 idle compute gap: 7.5 |
| 2. WAREHOUSE_LOAD_HISTORY | avg_running elevated during 06:10-06:40 avg_queued_load near zero avg_queued_provisioning spikes at warehouse resume avg_blocked negligible |
| 3. QUERY_HISTORY | one query_parameterized_hash dominates bytes_scanned queued_overload_time low bytes_spilled_to_remote_storage high percentage_scanned_from_cache low partitions_scanned jumped versus the prior run |
| 4. QUERY_ATTRIBUTION_HISTORY (later check) | same query family accounts for 62% of attributed compute attribution confirms the suspect pattern, not the full warehouse bill |
| 5. TABLE_QUERY_PRUNING_HISTORY | affected fact table pruning_ratio: 0.18 partitions_scanned_per_query far above the recent baseline |
| Decision | do not resize first fix the scan-heavy query pattern shorten idle time on this task warehouse instead of paying for empty minutes |
That note tells me where the next ten minutes should go.
The metering lines tell me the warehouse did use more compute, but not all of it was query-attributed. That is my cue to keep both workload and idle time in view.
The load lines tell me the warehouse was not overloaded for the morning. avg_queued_load stays low. avg_queued_provisioning spikes around resume, but it does not explain the whole bill.
That keeps me from resizing first.
Then I use QUERY_HISTORY as the fast path. I group the repeat pattern with query_parameterized_hash, then inspect queue time, spill, cache use, and partitions scanned.
In this case one query family is scanning more, spilling more, and using less cache than the earlier baseline.
Later, once QUERY_ATTRIBUTION_HISTORY catches up, I can confirm that the same family consumed most of the attributed compute credits.
That confirmation matters, but it is the later layer. It still does not explain the warehouse idle gap.
Finally, TABLE_QUERY_PRUNING_HISTORY gives me the scan-efficiency answer. A low pruning ratio and high partitions scanned per query tell me this is not just a big query. It is a wasteful scan pattern.
My default decision here is boring on purpose.
I would not resize first because queue overload stayed low.
I would tighten the scan-heavy pattern, keep the warehouse idle gap visible, and check whether the task warehouse is sitting around longer than the workload justifies.
That boundary matters. As of 2026-02-10, Snowflake says suspending a warehouse drops its cache, recommends immediate suspension for tasks, and suggests at least 10 minutes for BI or SELECT-heavy warehouses that benefit from cache warmth (warehouse cache guidance).
In this example, the workload is a task warehouse. I care more about stopping empty minutes than preserving cache warmth between runs.
If the spike followed a dbt release, I also want the release path to stay legible.
That is why I keep the deployment record explicit in How I keep dbt Core deployments boring in production.
Tradeoffs
- Breaks when: I start with
QUERY_ATTRIBUTION_HISTORYand treat it as the whole bill → Mitigation: keep warehouse metering and the idle gap ahead of per-query attribution. - Breaks when: I rewrite SQL before checking whether the warehouse was overloaded, provisioning, or blocked → Mitigation: read
WAREHOUSE_LOAD_HISTORYbefore choosing sizing, scheduling, or query fixes. - Breaks when: I use
QUERY_HISTORYas if it already contains attributed compute cost → Mitigation: use it for immediate query symptoms, then come back later for lagged credit attribution. - Breaks when: I copy BI-oriented cache advice onto a task warehouse that should suspend quickly → Mitigation: match auto-suspend choices to the actual workload instead of one universal rule.
- Breaks when: I turn one warehouse investigation into a full Snowflake billing explainer → Mitigation: keep the post centered on one warehouse-spike path and use
METERING_DAILY_HISTORYonly for the billed-cost boundary.
Close
Next step: For one Snowflake warehouse that feels expensive right now, write a six-line note: metering change, idle-cost split, queue state, top query pattern, workload class, and first fix.
That note makes the first move less like generic tuning and more like a choice between sizing, scheduling, or workload separation.