What I write about: reliable analytics systems that stay explainable
I write about pipeline checks, metric definitions, dashboard trust, and delivery habits that keep analytics systems explainable when numbers move.
I write about the checks, decisions, and handoffs I use when a KPI moves, a pipeline slips, or two dashboards disagree.
If I cannot point to the query, release step, owner, or failed check behind a claim, I leave it out.
What I cover
- Pipeline checks that catch lateness, row loss, join failures, and broken publishes before the dashboard takes the hit
- Modeling patterns that keep row meaning, metric definitions, and dashboard numbers legible as the warehouse and team grow
- Delivery habits like release checklists, incident notes, ownership boundaries, and debugging order that hold up under pressure
A concrete example
Say Monday revenue drops in an executive dashboard and nobody expected it.
I do not start with the chart. I start with the smallest checks that narrow the failure.
- Confirm the source landed on time.
- Compare row counts and key joins in the model behind the chart.
- Check whether the metric logic or filter changed on purpose.
- Write down the current hypothesis before Slack fills with partial explanations.
That sequence is the first-pass order I use in When a dashboard number changes, I check these four things first.
That order tells me whether I am dealing with lateness, shape, joins, or definition drift before the room turns the issue into a tooling argument.
When those checks pass and two dashboards still disagree, I treat it as a definition problem, not a chart problem. I move to metric definition work before I touch the chart again.
When the incident is active and other people need updates, I keep a short incident note beside the debugging work.
I want each post to leave behind something I would run again: a check, a note template, a definition card, or a release habit that keeps the next incident explainable.
Tradeoffs
- Breaks when I start writing about tools before I can name the check, contract, or decision rule that matters. Mitigation: anchor the post to the operating artifact first and mention tooling only when it changes the workflow.
- Breaks when the useful answer depends on team size, latency expectations, or ownership structure. Mitigation: say that boundary directly and explain what I would change under those constraints.
- Breaks when the real example is too client-specific to publish cleanly. Mitigation: generalize names and numbers, but keep the failure mode, decision, and mitigation honest.
Close
Next step: Start with my default stack and workflow if you want the operating model behind these notes.
If the live pressure point is a late run, a KPI that moved, or a dashboard disagreement, use that situation to choose the first field note instead of reading the corpus front to back.