How I validate a metric after a backfill
I validate a rebuilt metric with a short backfill note, slice-by-slice comparison, and a published restatement boundary before I republish the dashboard.
Before I rerun a backfill, I write down the slices that should move and the ones that should stay flat. That short note tells me whether the rebuild is correcting history or merely moving numbers around.
When I validate a metric after a backfill, I start with the blast radius, not the total: which months, cohorts, or plans should move, which ones should stay flat, and what I need to explain in the BI release checklist before a dashboard change goes live before the dashboard returns to production.
If metric meaning or row meaning is still fuzzy, a backfill exposes it fast.
Problem
Imagine I fix a billing bug that misclassifies invoice reversals in monthly subscription revenue. The code fix is small. The risky part comes next: I need to backfill the last 90 days and republish finance dashboards that leadership already used in prior reviews.
A 1.8% move in February is not automatically good or bad. The real question is whether the movement lands where I expected. If I cannot say which months, plans, or customer cohorts should change before I start the rebuild, I am not validating a metric. I am rerolling history and hoping the new total looks more credible.
Default approach
- Freeze the metric definition, owner, and grain before I rerun anything.
- Write a short backfill note that names the bug, the date window, the affected tables and dashboards, the expected movers, and the expected non-movers.
- Snapshot the pre-backfill result so I can compare before and after instead of relying on memory, screenshots, or dashboard cache.
- Compare the rebuild by stable slices such as billing month, plan type, or customer cohort, not just the top-line total.
- Investigate two kinds of surprises: slices that moved when they should have stayed flat, and slices that stayed flat when they should have moved.
- Publish the restatement boundary in one short note so downstream users know the rebuilt window, the expected movers, the expected non-movers, and when the rebuilt numbers become final.
That sequence gives me an explanation before I republish anything. It also helps me separate a real metric fix from unrelated model drift.
Example: a 90-day revenue backfill
Before I start the rebuild, I want a short note like this:
| Field | Value |
|---|---|
| Metric | subscription_revenue_monthly |
| Owner | finance analytics |
| Grain | one billed_account_id per billing_month |
| Reason for backfill | invoice reversals were excluded from the monthly revenue adjustment logic |
| Backfill window | 2025-08-01 through 2025-10-31 |
| Expected movers | monthly plans with reversal activity customer cohorts billed inside the 90-day window |
| Expected non-movers | annual prepaid plans free plans months before 2025-08 |
| Dashboards affected | finance MRR review monthly retention pack |
Then I compare before and after by a stable slice:
| billing_month | plan_type | revenue_before | revenue_after | delta | expected |
|---|---|---|---|---|---|
| 2025-08 | monthly | 182400 | 190900 | +8500 | yes |
| 2025-08 | annual | 264000 | 264000 | 0 | no |
| 2025-09 | monthly | 188100 | 195700 | +7600 | yes |
| 2025-09 | annual | 271500 | 274900 | +3400 | no |
| 2025-10 | monthly | 191800 | 199600 | +7800 | yes |
| 2025-10 | annual | 279200 | 279200 | 0 | no |
The restatement boundary I want published beside the dashboard is short:
Restatement notice
- rebuilt window: 2025-08-01 through 2025-10-31
- expected movers: monthly plans with reversal activity inside that window
- expected non-movers: annual prepaid plans, free plans, months before 2025-08
- numbers outside this boundary should remain unchanged
- dashboard stays unpublished until unexpected movement is explained
The top-line move is directionally right. The rebuild is still not validated. September annual-plan revenue changed even though annual plans were outside the bug. That one mismatch is enough for me to stop and investigate.
I keep the comparison query simple on purpose:
with before_backfill as (
select
billing_month,
plan_type,
sum(revenue_usd) as revenue_before
from finance.subscription_revenue_monthly_before
group by 1, 2
),
after_backfill as (
select
billing_month,
plan_type,
sum(revenue_usd) as revenue_after
from finance.subscription_revenue_monthly
group by 1, 2
)
select
coalesce(a.billing_month, b.billing_month) as billing_month,
coalesce(a.plan_type, b.plan_type) as plan_type,
coalesce(b.revenue_before, 0) as revenue_before,
coalesce(a.revenue_after, 0) as revenue_after,
coalesce(a.revenue_after, 0) - coalesce(b.revenue_before, 0) as delta
from before_backfill b
full outer join after_backfill a
on a.billing_month = b.billing_month
and a.plan_type = b.plan_type
order by 1, 2;
If I see that unexpected annual delta, I do not republish the dashboard yet. My next check is whether a dimension change or a plan-mapping fix got bundled into the same release, then I validate that unexpected slice on its own before I reopen the dashboard. That is how a clean metric correction gets confused with a wider model change.
This is why I do not validate a metric after a backfill by comparing totals alone. A metric can land close to the expected overall number and still move the wrong cohort, the wrong plan, or the wrong time period.
If the backfill still produces unexplained movement after this comparison, I treat it like a fresh reliability problem. At that point I go back to the checks I add before I trust a pipeline or the incident note template I use when a dashboard number changes.
Tradeoffs
- Breaks when: source history is incomplete, overwritten, or missing key fields needed to reconstruct the old logic → Mitigation: make the reconstruction limit explicit and say which periods or slices are no longer fully trustworthy.
- Breaks when: a slowly changing dimension or mapping rule changed in the same deployment as the metric fix → Mitigation: isolate the logic repair from the dimensional change, or validate each one with its own before-and-after comparison.
- Breaks when: the team only checks the total delta and never checks the shape of the change → Mitigation: compare by stable slices such as month, cohort, plan, or region before republishing dashboards.
- Breaks when: downstream users assume historical numbers never move after month-end → Mitigation: publish a restatement window and label which outputs are settled versus still allowed to change.
Close
Next step: For one metric that can be restated, write the dates, cohorts, and dashboards you expect to move if you backfill the last 90 days.
That expected-movers list gives the rerun a boundary before the total delta becomes the whole argument.