How I validate a metric after a backfill

Before I rerun a backfill, I write down the slices that should move and the ones that should stay flat. That short note tells me whether the rebuild is correcting history or merely moving numbers around.

When I validate a metric after a backfill, I start with the blast radius, not the total: which months, cohorts, or plans should move, which ones should stay flat, and what I need to explain in the BI release checklist before a dashboard change goes live before the dashboard returns to production.

If metric meaning or row meaning is still fuzzy, a backfill exposes it fast.

Problem

Imagine I fix a billing bug that misclassifies invoice reversals in monthly subscription revenue. The code fix is small. The risky part comes next: I need to backfill the last 90 days and republish finance dashboards that leadership already used in prior reviews.

A 1.8% move in February is not automatically good or bad. The real question is whether the movement lands where I expected. If I cannot say which months, plans, or customer cohorts should change before I start the rebuild, I am not validating a metric. I am rerolling history and hoping the new total looks more credible.

Default approach

Freeze the metric definition, owner, and grain before I rerun anything.
Write a short backfill note that names the bug, the date window, the affected tables and dashboards, the expected movers, and the expected non-movers.
Snapshot the pre-backfill result so I can compare before and after instead of relying on memory, screenshots, or dashboard cache.
Compare the rebuild by stable slices such as billing month, plan type, or customer cohort, not just the top-line total.
Investigate two kinds of surprises: slices that moved when they should have stayed flat, and slices that stayed flat when they should have moved.
Publish the restatement boundary in one short note so downstream users know the rebuilt window, the expected movers, the expected non-movers, and when the rebuilt numbers become final.

That sequence gives me an explanation before I republish anything. It also helps me separate a real metric fix from unrelated model drift.

Example: a 90-day revenue backfill

Before I start the rebuild, I want a short note like this:

Field	Value
Metric	subscription_revenue_monthly
Owner	finance analytics
Grain	one billed_account_id per billing_month
Reason for backfill	invoice reversals were excluded from the monthly revenue adjustment logic
Backfill window	2025-08-01 through 2025-10-31
Expected movers	monthly plans with reversal activity customer cohorts billed inside the 90-day window
Expected non-movers	annual prepaid plans free plans months before 2025-08
Dashboards affected	finance MRR review monthly retention pack

Then I compare before and after by a stable slice:

billing_month	plan_type	revenue_before	revenue_after	delta	expected
2025-08	monthly	182400	190900	+8500	yes
2025-08	annual	264000	264000	0	no
2025-09	monthly	188100	195700	+7600	yes
2025-09	annual	271500	274900	+3400	no
2025-10	monthly	191800	199600	+7800	yes
2025-10	annual	279200	279200	0	no

The restatement boundary I want published beside the dashboard is short:

Restatement notice
- rebuilt window: 2025-08-01 through 2025-10-31
- expected movers: monthly plans with reversal activity inside that window
- expected non-movers: annual prepaid plans, free plans, months before 2025-08
- numbers outside this boundary should remain unchanged
- dashboard stays unpublished until unexpected movement is explained

The top-line move is directionally right. The rebuild is still not validated. September annual-plan revenue changed even though annual plans were outside the bug. That one mismatch is enough for me to stop and investigate.

I keep the comparison query simple on purpose:

with before_backfill as (
  select
    billing_month,
    plan_type,
    sum(revenue_usd) as revenue_before
  from finance.subscription_revenue_monthly_before
  group by 1, 2
),
after_backfill as (
  select
    billing_month,
    plan_type,
    sum(revenue_usd) as revenue_after
  from finance.subscription_revenue_monthly
  group by 1, 2
)
select
  coalesce(a.billing_month, b.billing_month) as billing_month,
  coalesce(a.plan_type, b.plan_type) as plan_type,
  coalesce(b.revenue_before, 0) as revenue_before,
  coalesce(a.revenue_after, 0) as revenue_after,
  coalesce(a.revenue_after, 0) - coalesce(b.revenue_before, 0) as delta
from before_backfill b
full outer join after_backfill a
  on a.billing_month = b.billing_month
 and a.plan_type = b.plan_type
order by 1, 2;

If I see that unexpected annual delta, I do not republish the dashboard yet. My next check is whether a dimension change or a plan-mapping fix got bundled into the same release, then I validate that unexpected slice on its own before I reopen the dashboard. That is how a clean metric correction gets confused with a wider model change.

This is why I do not validate a metric after a backfill by comparing totals alone. A metric can land close to the expected overall number and still move the wrong cohort, the wrong plan, or the wrong time period.

If the backfill still produces unexplained movement after this comparison, I treat it like a fresh reliability problem. At that point I go back to the checks I add before I trust a pipeline or the incident note template I use when a dashboard number changes.

Tradeoffs

Breaks when: source history is incomplete, overwritten, or missing key fields needed to reconstruct the old logic → Mitigation: make the reconstruction limit explicit and say which periods or slices are no longer fully trustworthy.
Breaks when: a slowly changing dimension or mapping rule changed in the same deployment as the metric fix → Mitigation: isolate the logic repair from the dimensional change, or validate each one with its own before-and-after comparison.
Breaks when: the team only checks the total delta and never checks the shape of the change → Mitigation: compare by stable slices such as month, cohort, plan, or region before republishing dashboards.
Breaks when: downstream users assume historical numbers never move after month-end → Mitigation: publish a restatement window and label which outputs are settled versus still allowed to change.

Close

Next step: For one metric that can be restated, write the dates, cohorts, and dashboards you expect to move if you backfill the last 90 days.

That expected-movers list gives the rerun a boundary before the total delta becomes the whole argument.

How I validate a metric after a backfill

Problem

Default approach

Example: a 90-day revenue backfill

Tradeoffs

Close

Continue reading

A dashboard release checklist before a BI change goes live

Row counts are not enough: the checks I add before I trust a pipeline