<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Berhan Turkkaynagi</title><description>Field notes on keeping analytics systems reliable.</description><link>https://berhanturkkaynagi.com/</link><language>en-us</language><copyright>© Berhan Turkkaynagi</copyright><atom:link href="https://berhanturkkaynagi.com/rss.xml" rel="self" type="application/rss+xml"/><item><title>The metric ownership review I run before quarter close</title><link>https://berhanturkkaynagi.com/blog/posts/metric-ownership-quarter-close-review/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/metric-ownership-quarter-close-review/</guid><description>Before quarter-close reporting locks, I use a metric ownership review card to make changed definitions, owners, open questions, affected surfaces, and safe-to-lock decisions visible.</description><pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Quarter close is when metric ownership gaps stop being abstract.&lt;/p&gt;
&lt;p&gt;A KPI definition can change quietly during the quarter. An exclusion can move from one dashboard to another. A finance owner can approve a label in one surface while operations still reads the old rule in another. When the close package is about to lock, those small gaps become reporting risk.&lt;/p&gt;
&lt;p&gt;I do not want to relitigate every metric in the close meeting. I want a short review for the numbers that changed, affect a decision, or still carry an open question.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Quarter-close reporting puts a deadline on metric ambiguity.&lt;/p&gt;
&lt;p&gt;The issue is rarely that nobody cares about the number. The issue is that the current owner, latest definition, affected surface, and unresolved question are scattered across release notes, dashboard subtitles, pull requests, chats, and meeting memory.&lt;/p&gt;
&lt;p&gt;That scatter matters when the report is close-facing. If &lt;code&gt;active_customer&lt;/code&gt; changed its refund exclusion this quarter, finance needs to know which definition is in the close dashboard. If &lt;code&gt;fill_rate&lt;/code&gt; excludes manual holds in the operations dashboard, the service narrative needs to say that before the deck freezes. If a revenue-recognition reporting label was split for visibility, the analytics surface needs a finance-owned reporting-boundary answer before anyone treats it as close evidence.&lt;/p&gt;
&lt;p&gt;The failure mode is predictable: the review becomes a debate about every metric in the catalog, or it happens after the reporting surface has already locked.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;p&gt;I keep the review narrow.&lt;/p&gt;
&lt;p&gt;The metric enters the quarter-close ownership review only when at least one of these is true:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the definition changed this quarter;&lt;/li&gt;
&lt;li&gt;an exclusion or status bucket changed this quarter;&lt;/li&gt;
&lt;li&gt;the definition owner changed;&lt;/li&gt;
&lt;li&gt;the metric appears in a close-facing dashboard, packet, export, or executive narrative;&lt;/li&gt;
&lt;li&gt;an open question could change whether the surface is safe to lock.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For each included metric, I want the same fields in one place: reporting definition, decision it supports, owner, changed definition or exclusion, last-change check, open question, reporting surface affected, safe-to-lock judgment, and next owner/action.&lt;/p&gt;
&lt;p&gt;The important field is the judgment. A metric can be &lt;code&gt;safe to lock&lt;/code&gt;, &lt;code&gt;safe with owner note&lt;/code&gt;, or &lt;code&gt;hold&lt;/code&gt;. That prevents two bad defaults: blocking every surface because one edge case exists, or locking a close-facing number while the owner question is still floating in chat.&lt;/p&gt;
&lt;h2 id=&quot;example-the-ownership-card-i-want-before-reporting-locks&quot;&gt;Example: the ownership card I want before reporting locks&lt;/h2&gt;
&lt;p&gt;Here is the card I would rather write before the close packet freezes:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Metric ownership review card&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Close period: 2025 Q3 close package&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Review rule: only decision-critical metrics with changed definitions, exclusions, owners, or reporting surfaces&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;1) active_customer&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- reporting definition: billed account with at least one paid invoice in the close period&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- decision it supports: revenue retention and customer-count narrative for the close package&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- owner: finance analytics&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- definition/exclusion changed this quarter: refunded invoices are excluded; trial-only accounts remain excluded&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- last-change check: semantic definition PR and dashboard subtitle both show the refund exclusion&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- open question: confirm whether migrated accounts with invoice credits stay in the billed-account population&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- reporting surface affected: finance close dashboard and executive close narrative&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- safe-to-lock judgment: safe with owner note until migrated-account question is resolved&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- next owner/action: finance analytics confirms migrated-account handling before the close deck freezes&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;2) fill_rate&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- reporting definition: complete customer orders filled from available stock on the first warehouse execution attempt&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- decision it supports: quarter-end service and fulfillment narrative&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- owner: warehouse operations analytics&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- definition/exclusion changed this quarter: customer-requested future ship dates and manual holds are excluded&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- last-change check: metric card and operations dashboard filter note both show the new exclusions&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- open question: decide whether split shipments stay out of the close-facing metric or become a separate exception note&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- reporting surface affected: operations close dashboard and supply-chain service summary&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- safe-to-lock judgment: hold until split-shipment handling is named in the surface note&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- next owner/action: warehouse operations records split-shipment handling and republishes the filter note&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;3) revenue_recognition_reporting_definition&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- reporting definition: reporting surface uses the finance-approved recognition status label for close review&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- decision it supports: whether the analytics surface is safe to reference in the quarter-close package&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- owner: finance analytics with finance policy owner review&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- definition/exclusion changed this quarter: analytics field was renamed and one deferred-status bucket was split for reporting visibility&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- last-change check: close dashboard, semantic definition, and release note use the same status labels&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- open question: finance policy owner confirms whether the split bucket is explanatory only or blocks the close-facing surface&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- reporting surface affected: finance close dashboard and analytics release note&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- safe-to-lock judgment: hold the surface until the policy owner answers; this card does not define accounting treatment&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- next owner/action: finance analytics records the policy-owner answer or removes the surface from the close packet&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first row is not perfectly clean, but it may be safe with a note. The refund exclusion is visible in both the semantic definition and the dashboard subtitle. The open question is narrow: migrated accounts with invoice credits. I would not hold the entire close package for that by default, but I would name the owner and put the note beside the affected surface.&lt;/p&gt;
&lt;p&gt;The fill-rate row is different. The metric affects the quarter-end service narrative, and split shipments can change the story. If the surface does not say whether split shipments are excluded, included, or called out separately, I would hold that close-facing surface until the owner records the handling.&lt;/p&gt;
&lt;p&gt;The revenue-recognition row has the strongest boundary. The analytics artifact is not deciding accounting treatment. It is deciding whether the reporting surface is safe to reference. If the policy owner has not answered whether the split bucket is explanatory or blocking, the correct analytics decision is to hold the surface or remove it from the close packet.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: the review tries to cover every metric in the catalog → Mitigation: include only changed, decision-critical metrics or surfaces with unresolved close-facing questions.&lt;/li&gt;
&lt;li&gt;Breaks when: open questions stay in meeting notes or chat threads → Mitigation: record the affected surface, owner, and next action on the card before the report locks.&lt;/li&gt;
&lt;li&gt;Breaks when: every uncertainty becomes a blocker → Mitigation: separate &lt;code&gt;safe with owner note&lt;/code&gt; from &lt;code&gt;hold&lt;/code&gt;, and reserve holds for questions that can change the reported number, interpretation, or surface eligibility.&lt;/li&gt;
&lt;li&gt;Breaks when: revenue-recognition language starts sounding like policy guidance → Mitigation: keep the card at the reporting-definition boundary and let finance or policy owners decide treatment.&lt;/li&gt;
&lt;li&gt;Breaks when: the card becomes a stale governance artifact → Mitigation: use it for the close window, store it beside the release note or close checklist, and replace it when the next definition change happens.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Pick one close-facing metric that changed this quarter and write the ownership card before the reporting surface freezes: definition, owner, last-change check, open question, safe-to-lock judgment, and next action.&lt;/p&gt;
&lt;p&gt;The goal is not perfect metric governance. It is a calmer close meeting because the changed numbers already have owners, surface notes, and explicit lock or hold decisions.&lt;/p&gt;</content:encoded><category>metrics</category><category>operations</category><category>analytics-engineering</category></item><item><title>The source-trust scorecard I use before promoting a table to production</title><link>https://berhanturkkaynagi.com/blog/posts/source-trust-scorecard-production-input/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/source-trust-scorecard-production-input/</guid><description>Before I promote a source table into production dependencies, I score observed cadence, key stability, corrections, null risk, late arrivals, and owner response so the promotion decision is explicit and defensible.</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The source-trust scorecard is the promotion record between a valid contract and a production dependency.&lt;/p&gt;
&lt;p&gt;The dangerous moment is the promotion step. The columns land, row volume looks normal, and the first schema checks pass. Then a downstream model inherits key churn, late corrections, or an owner-response gap nobody reviewed before the table entered the daily dashboard path.&lt;/p&gt;
&lt;p&gt;My default is to separate the source contract from the promotion decision. The contract says what the source is supposed to do. The source-trust scorecard records what the source has actually done since ingestion started and whether that behavior is safe enough to depend on.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;I do not want the first real test of a source table to happen after a critical model depends on it.&lt;/p&gt;
&lt;p&gt;That is the trap with contract-compliant sources. The agreement may define row meaning, cadence, key rules, accepted values, and an owner, but the observation window can still show behavior that would make production noisy. A table can arrive late twice in two weeks. A vendor can merge catalogs and rekey items. A correction can restate data outside the expected window. A new status can appear before downstream logic knows what to do with it.&lt;/p&gt;
&lt;p&gt;Those facts do not always make the source unusable. They do mean I need a promotion record that says which uses can move forward, which ones need guardrails, and which ones stay blocked.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;p&gt;I use the scorecard after the initial source contract exists and before the table becomes a dependency for a critical model, dashboard, or recurring review.&lt;/p&gt;
&lt;p&gt;The review stays small on purpose:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Check cadence reliability against the promised landing pattern.&lt;/li&gt;
&lt;li&gt;Check key stability against the joins, deduplication, and history the downstream model needs.&lt;/li&gt;
&lt;li&gt;Check whether corrections are predictable, bounded, and explainable.&lt;/li&gt;
&lt;li&gt;Check null and domain risk on fields the downstream logic treats as required.&lt;/li&gt;
&lt;li&gt;Check whether late arrivals are normal, visible, and compatible with the reporting window.&lt;/li&gt;
&lt;li&gt;Check whether the source owner responds fast enough when evidence shows a defect.&lt;/li&gt;
&lt;li&gt;Write the promotion decision as &lt;code&gt;promote&lt;/code&gt;, &lt;code&gt;promote with guardrails&lt;/code&gt;, or &lt;code&gt;hold&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Every score needs a reference: a check result, run log, source-owner note, issue, or review record. If I cannot point to the evidence, I treat the score as an opinion and keep it out of the promotion decision.&lt;/p&gt;
&lt;h2 id=&quot;example-a-vendor-inventory-source-that-needs-guardrails&quot;&gt;Example: a vendor inventory source that needs guardrails&lt;/h2&gt;
&lt;p&gt;Here is the kind of source that makes the scorecard useful: a vendor inventory feed that already has a lightweight source contract.&lt;/p&gt;
&lt;p&gt;The table lands with expected columns. Row volume stays near the normal range. Basic schema checks pass. A shallow review would promote it into both the daily replenishment model and the executive inventory snapshot.&lt;/p&gt;
&lt;p&gt;The observation window says something narrower:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Source trust scorecard&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Source table: vendor_inventory_snapshot&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Candidate use: daily available-to-promise and replenishment models&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Observation window: 14 daily loads after initial source contract&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Promotion decision: promote with guardrails&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Decision owner: analytics engineering, reviewed with inventory source owner&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;1) Cadence reliability&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Expected: daily landing by 05:30 ET&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Observed evidence: 12 of 14 loads landed on time; 2 arrived 45-70 minutes late with source-owner notice&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Risk note: acceptable for noon planning refresh, not safe for early-morning executive snapshot&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Score: guardrail&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;2) Key stability&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Expected: one row per source_item_id, location_id, snapshot_date&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Observed evidence: 0 duplicate rows on stable keys, but 3% of items were rekeyed after vendor catalog merge&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Risk note: downstream joins need a rekey bridge before production promotion&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Score: guardrail&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;3) Correction pattern&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Expected: corrections can restate the last 72 hours only&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Observed evidence: two corrections landed inside 72 hours; one correction restated a five-day-old partition&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Risk note: publish recent days as preliminary until correction window is proven or exception is explained&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Score: guardrail&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;4) Null/domain risk&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Expected: item_id, location_id, on_hand_qty, and status are populated with accepted values&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Observed evidence: location_id null rate stayed below 0.2%; status introduced HOLD without prior notice&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Risk note: HOLD needs an accepted-value rule and owner-confirmed downstream handling&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Score: guardrail&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;5) Late-arrival pattern&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Expected: late rows are rare and visible through load metadata&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Observed evidence: late rows cluster around vendor weekend reconciliation and affect Monday snapshots&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Risk note: Monday planning slice needs a freshness and late-row annotation before release&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Score: guardrail&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;6) Owner response&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Expected: source owner acknowledges production-blocking defects same business day&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Observed evidence: owner answered cadence and status questions same day; rekey explanation took two days&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Risk note: key churn is the only unresolved promotion blocker&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Score: guardrail&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Decision&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Promote with guardrails after adding the rekey bridge, accepted-value rule for HOLD, and preliminary label for recent correction windows.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Hold executive-facing snapshots until the next observation window shows cadence and key behavior are stable.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That decision is not a clean promote, and it is not a full hold.&lt;/p&gt;
&lt;p&gt;The source is useful for the noon replenishment path after the rekey bridge and &lt;code&gt;HOLD&lt;/code&gt; handling are in place. Recent partitions need a preliminary label until correction behavior is clearer. The executive snapshot waits because early-morning cadence and key stability are still the two ways this table can put the wrong number in front of the wrong room.&lt;/p&gt;
&lt;p&gt;The scorecard keeps that judgment visible. It shows why one production use can move forward while another stays blocked, and it gives the next reviewer the evidence behind the boundary.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: teams use the scorecard to block exploratory analysis or every low-risk staging table → Mitigation: reserve it for sources that will feed critical models, dashboards, recurring reviews, or other downstream commitments.&lt;/li&gt;
&lt;li&gt;Breaks when: scoring becomes subjective gatekeeping or a permanent grade → Mitigation: record the exact check, run log, source-owner note, issue, or review evidence behind each score, then revisit the score when source behavior changes.&lt;/li&gt;
&lt;li&gt;Breaks when: teams wait for a perfect source and delay useful delivery → Mitigation: allow &lt;code&gt;promote with guardrails&lt;/code&gt; when the limits are explicit, such as preliminary labels, blocked dashboard surfaces, rekey bridges, or owner-confirmed correction windows.&lt;/li&gt;
&lt;li&gt;Breaks when: the scorecard duplicates the source contract and creates two places for the same rule → Mitigation: let the contract define expectations, and let the scorecard record observed behavior against those expectations during the promotion window.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Before one critical model or dashboard depends on a new source, score the source against observed cadence, key stability, corrections, null risk, late arrivals, owner response, and the promote/guard/hold decision.&lt;/p&gt;
&lt;p&gt;The scorecard is complete only when the downstream owner can see why the table moves forward, moves with guardrails, or stays held.&lt;/p&gt;</content:encoded><category>data-engineering</category><category>data-quality</category><category>data-reliability</category></item><item><title>The change log I publish when a backfill moves reported numbers</title><link>https://berhanturkkaynagi.com/blog/posts/backfill-published-number-change-log/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/backfill-published-number-change-log/</guid><description>When a validated backfill moves reported numbers, I publish a change log with affected ranges, expected movement, evidence, sign-off, and interpretation boundaries.</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Republication pressure starts when a corrected number replaces one people have already used.&lt;/p&gt;
&lt;p&gt;The dashboard is safer to use after the rebuild, but the number has already appeared in a deck, export, or business review. If I republish without a short change log, finance and operations readers are left comparing screenshots and asking whether the metric changed, the business changed, or the team quietly fixed an error.&lt;/p&gt;
&lt;p&gt;My default is to publish the communication record after validation passes. The validation evidence tells me the movement is expected. The change log tells readers what moved and how to use the new number.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Backfills usually get treated as technical work: rebuild the range, compare before and after, confirm the expected movement, and republish the dashboard.&lt;/p&gt;
&lt;p&gt;That is necessary, but it is not enough when the number was already circulating inside the company. A finance lead may have last week’s export. An operations manager may have copied the old dashboard value into a review deck. Someone may notice that April and May changed and assume the definition changed too.&lt;/p&gt;
&lt;p&gt;The trust break is not always the backfill. Sometimes the trust break is the missing explanation after the backfill.&lt;/p&gt;
&lt;p&gt;For decision-critical metrics, I want one compact change log before the republished number becomes the new argument.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;p&gt;I publish a change-log entry when a validated backfill moves a number that already appeared in a dashboard, deck, export, or recurring review.&lt;/p&gt;
&lt;p&gt;The entry has to answer a narrow set of questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Which metric or published surface changed?&lt;/li&gt;
&lt;li&gt;Which date range, cohort, dashboard, or export is affected?&lt;/li&gt;
&lt;li&gt;What direction of movement should readers expect?&lt;/li&gt;
&lt;li&gt;Which validation evidence says the movement is expected?&lt;/li&gt;
&lt;li&gt;Who approved the business interpretation?&lt;/li&gt;
&lt;li&gt;How should readers treat older decks, exports, or screenshots?&lt;/li&gt;
&lt;li&gt;What remains open or under follow-up?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I keep that record separate from the validation workbook. The workbook proves the rebuild is safe enough to publish. The change log helps people interpret the number they can now see.&lt;/p&gt;
&lt;p&gt;The line I care about most is the interpretation boundary. If the backfill corrects shipped-status timing, I want the next review to treat the movement as a timing correction, not as a new demand signal.&lt;/p&gt;
&lt;h2 id=&quot;example-the-change-log-i-want-before-republishing&quot;&gt;Example: the change log I want before republishing&lt;/h2&gt;
&lt;p&gt;Imagine a 90-day order-status backfill.&lt;/p&gt;
&lt;p&gt;A source system corrected late carrier acknowledgements. Orders that were stuck in &lt;code&gt;pending&lt;/code&gt; now settle as &lt;code&gt;shipped&lt;/code&gt;. February and March shipped-volume totals increase. April stays close to the prior publish because most corrections were already inside the reporting cutoff.&lt;/p&gt;
&lt;p&gt;The rebuild passes validation. The row-count and status-transition checks match the expected correction window. The before/after comparison shows the movement in the right months.&lt;/p&gt;
&lt;p&gt;Now the communication problem starts.&lt;/p&gt;
&lt;p&gt;The weekly operations dashboard, monthly KPI export, and finance-facing reconciliation tab all show the republished values. Someone can reasonably compare them against an older deck. This is the change log I want available before that happens:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Published-number change log&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Metric: shipped volume&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Reason for backfill: source corrected order status for late carrier acknowledgements&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Affected range: 2025-02-01 through 2025-04-30&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Published surfaces touched:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- weekly operations dashboard / shipped volume card&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- monthly KPI export used in the supply review&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- finance-facing volume reconciliation tab&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Expected movement:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- February and March shipped volume should increase slightly&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- April shipped volume should stay close to the prior publish because most corrections landed before month end&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- on-time delivery percentage may move in the same rows, but fill-rate definition is unchanged&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Validation evidence:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- row-count and status-transition comparison completed for February-April partitions&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- before/after metric comparison reviewed by analytics engineering&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- unexpected movement threshold: any month moving outside the signed validation slice gets held from republication&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Owner sign-off:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- analytics engineering approved the rebuild evidence&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- operations owner approved the interpretation note for the review deck&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Interpretation boundary:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- use the new dashboard for February-April comparisons after 2025-05-27&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- do not treat the movement as a new demand signal; it is a correction to shipped-status timing&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- older exports remain historical snapshots and should not be mixed with the republished dashboard without this note&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Open follow-up:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- source owner will confirm whether the carrier acknowledgement correction needs a permanent freshness check&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That note does not explain every row. It gives the reader enough context to stop guessing.&lt;/p&gt;
&lt;p&gt;The affected range names where comparison risk lives. The surfaces list tells dashboard owners and review owners where to expect questions. The expected movement separates a controlled correction from a surprise. The sign-off separates engineering validation from business interpretation.&lt;/p&gt;
&lt;p&gt;The open follow-up matters too. If the source owner still needs to decide whether this correction should become a permanent freshness check, I would rather say that plainly than let the change log sound more final than the system actually is.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: every low-risk correction gets the same ceremony → Mitigation: reserve the full change log for published dashboards, recurring reviews, finance-facing exports, or metrics with named owners.&lt;/li&gt;
&lt;li&gt;Breaks when: the change log becomes a duplicate validation workbook → Mitigation: summarize expected movement and reference the validation evidence separately instead of listing every row-level difference.&lt;/li&gt;
&lt;li&gt;Breaks when: the technical result is valid but the business meaning is still uncertain → Mitigation: publish a temporary interpretation boundary, name the owner, and record the follow-up instead of pretending the uncertainty is gone.&lt;/li&gt;
&lt;li&gt;Breaks when: teams mix old exports with republished dashboards without context → Mitigation: declare which source is authoritative after republication and label older artifacts as historical snapshots.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Before republishing one backfilled metric, write the change-log entry a reader would need when comparing the new dashboard to last week’s deck.&lt;/p&gt;
&lt;p&gt;If that note cannot name the affected range, expected movement, validation evidence, owner sign-off, and interpretation boundary, the backfill may be technically done while the trust work is still open.&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>backfills</category><category>metrics</category></item><item><title>When an analytics incident needs a postmortem, not just a note</title><link>https://berhanturkkaynagi.com/blog/posts/analytics-incident-postmortem-trigger/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/analytics-incident-postmortem-trigger/</guid><description>I use a simple trigger table to decide when an analytics incident needs a note, a lightweight review, or a full postmortem based on impact, recurrence, exposure, and unresolved prevention work.</description><pubDate>Sat, 09 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;After the dashboard is fixed, the next decision is whether the incident is actually closed.&lt;/p&gt;
&lt;p&gt;After a data trust break, I still need to decide whether the short incident note is enough. Some incidents need one prevention item and a clean close. Others need a full postmortem because the same failure will come back if nobody names the pattern, owner, and follow-up path.&lt;/p&gt;
&lt;p&gt;My default is to choose the review level from triggers, not from the temperature of the room. A trigger table keeps the response proportional.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Analytics incidents usually fail in one of two directions: too much ceremony for a small miss, or too little learning for a repeated trust break.&lt;/p&gt;
&lt;p&gt;If every late dashboard, stale extract, or corrected number gets a full postmortem, people stop reading them. The process becomes ceremony, and the useful reviews get buried.&lt;/p&gt;
&lt;p&gt;If every incident gets closed as a short note, repeated failures stay invisible. A published number can be corrected twice, each time with a plausible local explanation, while the real prevention work never gets accepted.&lt;/p&gt;
&lt;p&gt;I want the decision rule in writing before the next incident. After the fix, everyone is tired, the meeting clock is loud, and the team is already biased toward either moving on or making the review bigger than it needs to be.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;p&gt;I start with the incident note, then decide whether the incident needs note-only closure, a lightweight review, or a full postmortem.&lt;/p&gt;
&lt;p&gt;The table has to answer the question a lead is actually facing: can we close the note, do we need a short review, or did this incident expose a system weakness that needs a full postmortem?&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Analytics incident escalation table&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Field                         | Note only                              | Lightweight review                         | Full postmortem&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;------------------------------|----------------------------------------|--------------------------------------------|----------------&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Impact                        | no decision, meeting, export, or KPI trust was affected | one team, dashboard, or scheduled review was delayed or temporarily unsafe | a decision, finance process, customer-facing surface, or executive review was affected&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Recurrence                    | first isolated occurrence with understood cause | repeated pattern in one asset or recent near-miss | repeated cross-surface failure or unresolved previous prevention item&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Exposure                      | contained inside the analytics team    | visible to one business owner or operating team | visible outside the immediate team or tied to a formal reporting path&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Customer/executive visibility  | none                                   | possible if the issue is not handled before the next review | confirmed customer, executive, finance-close, board, or external-reporting visibility&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Detection path                | expected alert or check caught it before use | human report found it, or alert lacked enough responder context | users found it, monitoring failed, or detection happened after a bad decision&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Unresolved prevention work    | one clear fix and owner                | prevention needs coordination across analytics and one partner | root cause or prevention path is unclear, cross-team, risky, or under-owned&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Review level                  | close the short incident note          | schedule a short review with timeline and action list | write a blameless postmortem with timeline, contributing causes, impact, owners, and follow-up links&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Owner                         | incident owner closes the note         | analytics lead owns review and action follow-through | incident owner plus accountable analytics, engineering, and business owners&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Next action                   | record fix and prevention in the note  | add one or two follow-up tasks with due dates | track postmortem actions until accepted, rejected, or replaced&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is not a scoring system. I do not add up points and pretend the number made the decision.&lt;/p&gt;
&lt;p&gt;I use the table to make the judgment visible. If impact was low, detection worked, and prevention is obvious, I keep the incident small. If recurrence, executive exposure, failed detection, or unclear prevention shows up, I escalate before the incident becomes folklore.&lt;/p&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;Here is the note-only case.&lt;/p&gt;
&lt;p&gt;A daily operations dashboard publishes &lt;code&gt;11&lt;/code&gt; minutes late because an upstream extract lands after its usual window. The dashboard is not used until the afternoon standup. The delay is caught before the meeting. The data publishes cleanly. The prevention item is clear: update the source cutoff note and widen the warning window so the team sees the risk earlier.&lt;/p&gt;
&lt;p&gt;That incident deserves a short note, not a postmortem.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Incident: operations dashboard published 11 minutes late&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Impact: no meeting or decision used stale data&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Recurrence: first isolated delay from this source handoff&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Exposure: contained inside analytics&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Detection: expected freshness check caught it before use&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Prevention: update source cutoff note and warning threshold&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Review level: note only&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Owner: analytics engineer closes note&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Next action: record prevention item and monitor next scheduled run&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A full review would not add much. It would spend more attention than the incident earned, and it would teach the wrong habit: every small delay becomes a meeting instead of a clean note with an owner.&lt;/p&gt;
&lt;p&gt;Here is the case that should not stay note-only.&lt;/p&gt;
&lt;p&gt;A finance-facing revenue number is republished twice in one month after a join change drops a subset of settled refunds. The first incident had a short note and one prevention item. The second reaches the finance close review before the discrepancy is caught, and the corrected number has to be explained to executives. Nobody can say whether the failure was release review, metric ownership, detection, or the prevention item from the first note not being done.&lt;/p&gt;
&lt;p&gt;That incident needs a full postmortem.&lt;/p&gt;
&lt;p&gt;The trigger is not embarrassment. The trigger is the combination of recurrence, executive exposure, a formal finance process, failed detection, and unresolved prevention work across ownership boundaries.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Incident: finance revenue number republished after settled-refund join issue&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Impact: business review used or almost used the wrong number&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Recurrence: second related incident in one month&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Exposure: finance-facing dashboard and review packet&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Detection: discrepancy found by a person, not by the expected check&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Unresolved prevention: unclear whether release review, metric ownership, or join monitoring failed&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Review level: full postmortem&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Owner: analytics incident owner plus finance analytics owner&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Next action: write timeline, name contributing causes, and track accepted prevention work&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The document is not the point. The point is deciding whether the incident revealed a system weakness that a short note cannot close and a vague action item will not prevent.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Breaks when:&lt;/strong&gt; every small analytics alert gets the same heavy review. → &lt;strong&gt;Mitigation:&lt;/strong&gt; keep note-only and lightweight-review paths legitimate so full postmortems stay worth reading.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Breaks when:&lt;/strong&gt; the postmortem becomes a search for who caused the wrong number. → &lt;strong&gt;Mitigation:&lt;/strong&gt; keep the review blameless and frame the work around missing signals, unclear ownership, weak release checks, and follow-up the team can actually change.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Breaks when:&lt;/strong&gt; repeated small incidents keep looking harmless in isolation. → &lt;strong&gt;Mitigation:&lt;/strong&gt; include recurrence and unresolved prior action items in the trigger table so patterns can roll up before trust erodes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Breaks when:&lt;/strong&gt; the team writes a good review but nobody accepts the prevention work. → &lt;strong&gt;Mitigation:&lt;/strong&gt; attach every review level to an owner, a next action, and a backlog path where the action can be accepted, rejected, or replaced.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;The smallest useful version is a trigger table the team agrees to before the next incident.&lt;/p&gt;
&lt;p&gt;It does not need to be perfect. It needs to answer one question clearly: when is a short note enough, and when would moving on leave the same trust break waiting for the next review?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Pick one recent analytics incident and classify it three ways: note-only, lightweight review, or full postmortem.&lt;/p&gt;
&lt;p&gt;If the answer is hard to defend, the table is the work to do before the next incident forces that decision under pressure.&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>operations</category><category>data-reliability</category></item><item><title>The first-response runbook I want behind every analytics SLA alert</title><link>https://berhanturkkaynagi.com/blog/posts/sla-alert-response-runbook/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/sla-alert-response-runbook/</guid><description>I keep a short runbook behind analytics SLA alerts so the first responder can see impact, last success, failed step, source freshness, owner, escalation path, and the stakeholder message before the cutoff is missed.</description><pubDate>Tue, 05 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;At &lt;code&gt;07:34 ET&lt;/code&gt;, a vague SLA alert is already late.&lt;/p&gt;
&lt;p&gt;If the &lt;code&gt;08:00 ET&lt;/code&gt; operations review depends on the dashboard, I do not want the alert to say only that a job missed its schedule. I want the first responder to know whether the output is safe, where the run last succeeded, which boundary failed, who owns the next check, and what message should go to the people waiting on the number.&lt;/p&gt;
&lt;p&gt;The runbook is what makes the alert usable.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Analytics SLA alerts often create urgency without giving the responder a useful first move.&lt;/p&gt;
&lt;p&gt;A scheduler can say the daily KPI refresh missed the &lt;code&gt;07:30 ET&lt;/code&gt; cutoff. That matters, but it does not answer the questions that decide the first response: Did the source land? Did yesterday’s dashboard stay published? Did the failure happen in the transform or the publish step? Is the &lt;code&gt;08:00 ET&lt;/code&gt; review unsafe, or can the team use the last settled snapshot?&lt;/p&gt;
&lt;p&gt;When those answers are missing, the first ten minutes become tool-hopping. One person opens the orchestrator. Another checks the dashboard. Someone asks whether stakeholders should wait. The alert technically worked, but the response started from scratch.&lt;/p&gt;
&lt;p&gt;For decision-critical analytics, I want the runbook card attached before the alert fires.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;p&gt;My default is to keep one short runbook card behind each SLA family that can interrupt a real decision.&lt;/p&gt;
&lt;p&gt;The card has to answer a narrow set of questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What business cutoff makes this alert matter?&lt;/li&gt;
&lt;li&gt;Which dashboard, export, model, or review is affected?&lt;/li&gt;
&lt;li&gt;What was the last successful run or publish?&lt;/li&gt;
&lt;li&gt;Which step failed or became late?&lt;/li&gt;
&lt;li&gt;Is the upstream source fresh enough to trust?&lt;/li&gt;
&lt;li&gt;Does the published output still have a normal shape if a partial publish exists?&lt;/li&gt;
&lt;li&gt;Who is the first responder, who is the backup, and when does escalation start?&lt;/li&gt;
&lt;li&gt;What should stakeholders hear before the cause is fully known?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I do not need a long wiki page at this moment. I need a card that changes the first action.&lt;/p&gt;
&lt;p&gt;If the runbook cannot say what to check first, the alert is not ready to page someone. It may still be a warning, a dashboard marker, or a backlog item. Paging should be reserved for failures that can affect a business cutoff and have a response path attached.&lt;/p&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;For this case, the daily KPI dashboard needs to publish by &lt;code&gt;07:30 ET&lt;/code&gt; for an &lt;code&gt;08:00 ET&lt;/code&gt; operations review.&lt;/p&gt;
&lt;p&gt;At &lt;code&gt;07:34 ET&lt;/code&gt;, the alert fires. This is the card I want behind it:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;SLA alert runbook card&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Alert: daily KPI dashboard refresh missed 07:30 ET publish cutoff&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Business cutoff: 08:00 ET operations review&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Affected output: Daily KPI dashboard / executive summary tiles&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Decision exposure: review should not use today&apos;s dashboard until publish is confirmed&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Last successful run: 2025-03-03 07:18 ET&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Current run: 2025-03-04 started 06:42 ET; failed at 07:24 ET&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Failing step: transform_daily_kpi / publish_mart_daily_kpi&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;First responder: analytics engineering on-call&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Backup / escalation: data platform owner if source freshness is late by 07:40 ET; business owner if dashboard remains unsafe by 07:50 ET&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;First five checks&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;1. Source freshness: finance_extract landed by 06:10 ET? latest observed timestamp?&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;2. Failed step: inspect transform_daily_kpi error and recent code/config change.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;3. Last publish: confirm latest successful published partition and dashboard cache timestamp.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;4. Output shape: compare row count, null-rate, and one KPI total against recent same-weekday band if a partial publish exists.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;5. Recovery choice: rerun, hold yesterday&apos;s snapshot, or mark dashboard unsafe for the 08:00 ET review.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Stakeholder message template&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Status: daily KPI dashboard is &amp;#x3C;safe/unsafe/under review&gt; for the 08:00 ET operations review.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Evidence: last successful publish is &amp;#x3C;timestamp&gt;; current run failed at &amp;#x3C;step&gt;; source freshness is &amp;#x3C;ok/late/unknown&gt;.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Next action: &amp;#x3C;rerun/hold yesterday&apos;s snapshot/investigate failed step&gt;.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Next update: &amp;#x3C;time&gt; from &amp;#x3C;responder&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first useful line is the business cutoff. Without it, the responder cannot tell whether this is a page, a warning, or a note for later cleanup.&lt;/p&gt;
&lt;p&gt;The next useful line is the last successful run. If yesterday’s dashboard is still published and clearly labeled, the team may have a safe fallback for the review. If the latest publish is partial, stale, or cached in a confusing state, the responder should say that early instead of letting people assume the dashboard is current.&lt;/p&gt;
&lt;p&gt;The failing step matters because it prevents the first responder from starting at the wrong layer. If the source extract is late, the next check is upstream freshness and escalation. If the source landed and the transform failed, the next check is the model error, recent change, and rerun path. If the transform finished but the dashboard cache did not refresh, the response is different again.&lt;/p&gt;
&lt;p&gt;The stakeholder template is part of the runbook because communication is not separate from recovery. At &lt;code&gt;07:45 ET&lt;/code&gt;, a calm update is more useful than a perfect root cause that arrives after the review starts.&lt;/p&gt;
&lt;p&gt;The first update can stay this small:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Status: daily KPI dashboard is under review for the 08:00 ET operations review.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Evidence: last successful publish is 2025-03-03 07:18 ET; current run failed at publish_mart_daily_kpi; source freshness is ok.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Next action: analytics engineering is rerunning the failed publish step and checking output shape before clearing the dashboard.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Next update: 07:50 ET from analytics engineering on-call.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That message does not pretend to know the cause. It tells people what is safe, what evidence exists, what happens next, and when they will hear again.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: one generic runbook tries to cover every pipeline, dashboard, and alert → Mitigation: keep one card per critical output or SLA family so the cutoff, affected surface, owner, and first checks are specific.&lt;/li&gt;
&lt;li&gt;Breaks when: every warning uses paging severity → Mitigation: page only when the failure can affect a decision, review, export, or dashboard promise; keep lower-severity warnings visible without waking the same responder.&lt;/li&gt;
&lt;li&gt;Breaks when: the stakeholder template hides uncertainty → Mitigation: allow &lt;code&gt;unknown&lt;/code&gt; as a real state and require the next check plus next update time instead of forcing a fake root cause.&lt;/li&gt;
&lt;li&gt;Breaks when: the card names source freshness or output-shape checks that nobody maintains → Mitigation: mark missing signals honestly and assign the owner before treating the alert as production-ready.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Pick one SLA alert that could interrupt a real review and write the card behind it: cutoff, affected output, last success, failed step, source freshness, responder, escalation, and stakeholder message.&lt;/p&gt;
&lt;p&gt;A page that still sends the responder to three tools before they can say whether the dashboard is safe is not finished yet.&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>operations</category><category>observability</category></item><item><title>The schema-change checklist I use before a source breaks downstream models</title><link>https://berhanturkkaynagi.com/blog/posts/schema-change-checklist-downstream-models/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/schema-change-checklist-downstream-models/</guid><description>I use a schema-change checklist to classify additive fields, renames, type changes, dropped fields, and semantic changes before an upstream source breaks downstream models or dashboards.</description><pubDate>Sat, 02 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Schema changes get expensive when they look harmless at ingest time.&lt;/p&gt;
&lt;p&gt;The load can finish, the row count can land, and the first visible cost can be downstream: a join starts dropping customers, a timestamp watermark skips records, or a dashboard keeps using a field whose meaning moved. I do not want the first serious schema-change review to happen after the model is wrong.&lt;/p&gt;
&lt;p&gt;Before promotion, I classify the change, name the downstream surface, and choose the response on purpose: absorb it, warn consumers, quarantine records, block promotion, coordinate a migration, or run compatibility in parallel.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Schema changes are easy to underreact to when the pipeline stays green.&lt;/p&gt;
&lt;p&gt;An additive field looks harmless until an analyst exposes it before the business meaning is agreed. A rename looks simple until it breaks a watermark or dashboard filter. A type change looks like an implementation detail until a staging join silently casts away the value downstream models depend on.&lt;/p&gt;
&lt;p&gt;The cost is not only the broken run. It is the uncertainty after the run: which models used the field, which dashboard consumed the model, who owns the response, and why the team let that shape move forward without a decision.&lt;/p&gt;
&lt;p&gt;This is different from the initial source agreement I want in &lt;a href=&quot;/blog/posts/minimum-data-contract-source-table/&quot;&gt;a six-part data contract for a source table&lt;/a&gt;. That contract says what the source is supposed to mean. This checklist is what I use when the source moves anyway.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Capture the observed change: source, field, old shape, new shape, and when the change appeared.&lt;/li&gt;
&lt;li&gt;Classify the change before reacting: additive field, rename, type change, dropped field, or semantic change.&lt;/li&gt;
&lt;li&gt;Check downstream use across ingestion, staging models, marts, dashboards, metric definitions, extracts, and known consumers.&lt;/li&gt;
&lt;li&gt;Name the owner for the response. The upstream owner, transformation owner, and business-facing consumer owner may not be the same person.&lt;/li&gt;
&lt;li&gt;Choose the promotion decision explicitly: absorb, warn, quarantine, block promotion, coordinate migration, or run compatibility in parallel.&lt;/li&gt;
&lt;li&gt;Record the evidence next to the pull request, incident note, release comment, validation run, or source-contract update.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The classification matters because not every change deserves the same gate. Unused additive fields can often move into raw or staging with a warning. Key type changes, watermark renames, dropped business fields, and semantic shifts usually need a stronger stop.&lt;/p&gt;
&lt;p&gt;That is also why row counts are not enough. A schema change can keep the same number of rows while changing join behavior, null behavior, or metric meaning. I still want the output checks from &lt;a href=&quot;/blog/posts/row-counts-are-not-enough/&quot;&gt;the checks I add before I trust a pipeline&lt;/a&gt;, but I do not wait for those checks to be the first place the structural change is understood.&lt;/p&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;Imagine the customer API changes during a normal validation run.&lt;/p&gt;
&lt;p&gt;Three things happen at once: &lt;code&gt;customer_id&lt;/code&gt; changes from a numeric identifier to a string identifier, &lt;code&gt;customer_segment&lt;/code&gt; appears as a new nullable field, and &lt;code&gt;created_at&lt;/code&gt; is renamed to &lt;code&gt;created_timestamp&lt;/code&gt;. Ingestion still lands records, but the change touches the raw load, the staging model, the customer mart, and an executive dashboard filter.&lt;/p&gt;
&lt;p&gt;This is the checklist I want before promotion.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Schema-change classification checklist&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Source: customer API / customers endpoint&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Observed on: 2025-02-11 validation run&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Owner recording decision: analytics engineering&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Change 1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;field: customer_id&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;old shape: integer-like numeric identifier&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;new shape: string identifier&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;classification: type change&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;critical downstream use: staging joins, customer mart key, dashboard drill-through URL&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;first risk: silent cast or broken join if downstream expects numeric&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;owner: data engineering owns ingest; analytics engineering owns staging model&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;promotion decision: block promotion until raw string is preserved, staging key cast is explicit, and join tests pass&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;follow-up evidence: source contract note, staging PR, validation run, join/null test output&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Change 2&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;field: customer_segment&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;old shape: not present&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;new shape: nullable string&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;classification: additive field&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;critical downstream use: none yet; requested by lifecycle reporting later&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;first risk: low for current dashboards, medium if analysts use it before the definition is certified&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;owner: lifecycle analytics owner confirms definition before mart exposure&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;promotion decision: absorb at raw/staging layer, warn that it is not business-certified yet&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;follow-up evidence: release comment says the field is staged but not curated&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Change 3&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;field: created_at -&gt; created_timestamp&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;old shape: created_at timestamp string&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;new shape: created_timestamp timestamp string&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;classification: rename&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;critical downstream use: incremental load watermark, staging model, cohort dashboard&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;first risk: latest records stop loading or cohorts shift if the old field name is assumed&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;owner: data engineering confirms source change; analytics engineering updates staging compatibility&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;promotion decision: coordinate migration with temporary compatibility alias and alert on old-field disappearance&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;follow-up evidence: PR includes alias removal date, dashboard validation slice, and owner sign-off&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Checklist fields to preserve for every schema change&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- source and observed date&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- field name or semantic rule&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- old shape and new shape&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- classification: additive field, rename, type change, dropped field, semantic change&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- downstream impact: models, dashboards, metrics, exports, or consumers&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- owner: upstream, transformation, and business-facing owner where relevant&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- promotion decision: absorb, warn, quarantine, block promotion, coordinate migration, or run parallel compatibility&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- follow-up evidence: PR, source contract update, validation run, release note, or incident note&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The three changes should not get one blanket decision.&lt;/p&gt;
&lt;p&gt;For &lt;code&gt;customer_id&lt;/code&gt;, I block promotion until the raw string is preserved and the staging model makes the cast explicit. A key is not a cosmetic field. If the downstream mart expects numeric IDs, a quiet cast can create join misses that look like customer churn or dashboard drill-through defects.&lt;/p&gt;
&lt;p&gt;For &lt;code&gt;customer_segment&lt;/code&gt;, I can absorb the field earlier because no current dashboard depends on it. But I still warn consumers that it is not business-certified. The field should not appear in the curated mart until someone owns allowed values, null behavior, and the difference between source-system labels and reporting labels.&lt;/p&gt;
&lt;p&gt;For &lt;code&gt;created_at&lt;/code&gt;, I prefer a compatibility window. The staging model can expose the old alias briefly while the source owner confirms the rename and the analytics owner updates the watermark, cohort model, and dashboard slice. The important part is that the alias has an owner and removal date. Otherwise the compatibility layer becomes a permanent hiding place for unfinished migration work.&lt;/p&gt;
&lt;p&gt;Dropped fields and semantic changes fit the same checklist even though this API example does not include them. If &lt;code&gt;sales_region&lt;/code&gt; disappears, I want to know which model, metric, export, or owner loses required meaning before anyone fills a fake default. If &lt;code&gt;customer_segment&lt;/code&gt; keeps the same name but changes from lifecycle segment to marketing segment, I treat it as a semantic change and require the same promotion decision I would use for a visible schema break.&lt;/p&gt;
&lt;p&gt;Automated lineage and runtime observability can speed up the investigation. I still want a human-readable decision record, especially when lineage misses spreadsheets, dashboard extracts, or owner-maintained consumer lists. If the issue is already late, stale, or failed at runtime, &lt;a href=&quot;/blog/posts/pipeline-observability-before-more-orchestration/&quot;&gt;pipeline observability signals&lt;/a&gt; help the first responder. This checklist belongs one step earlier: before the team decides that the changed source is safe to promote.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: every additive field is treated as a release blocker → Mitigation: allow unused additive fields into raw and staging with a warning, but hold curated exposure until an owner confirms meaning, allowed values, and null behavior.&lt;/li&gt;
&lt;li&gt;Breaks when: a harmless widening is grouped with destructive type changes → Mitigation: distinguish widening from incompatible casts, preserve raw values, and run join/null checks before promotion.&lt;/li&gt;
&lt;li&gt;Breaks when: lineage misses spreadsheet exports, dashboard extracts, or manually maintained reports → Mitigation: pair automated lineage with a known-consumer list for business-critical models.&lt;/li&gt;
&lt;li&gt;Breaks when: compatibility aliases for renames become permanent → Mitigation: record the alias removal date, owner, and validation slice before the compatibility layer ships.&lt;/li&gt;
&lt;li&gt;Breaks when: a dropped field removes required business meaning and the upstream owner cannot restore it quickly → Mitigation: record degraded mode, consumer warning, and migration owner instead of silently filling fake defaults.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Pick one business-critical source that changed recently and classify the change before the next promotion: old shape, new shape, downstream use, owner, promotion decision, and follow-up evidence.&lt;/p&gt;
&lt;p&gt;If you want to compare notes on schema-change triage, I am most interested in the field that looks low-risk and the model that would prove otherwise.&lt;/p&gt;</content:encoded><category>data-engineering</category><category>data-pipelines</category><category>data-reliability</category></item><item><title>The evidence packet I want before an analytics release is approved</title><link>https://berhanturkkaynagi.com/blog/posts/analytics-release-evidence-packet/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/analytics-release-evidence-packet/</guid><description>I use a small analytics release evidence packet to keep changed assets, validation output, owner approval, rollback notes, and stakeholder context together before a decision-critical release goes live.</description><pubDate>Tue, 28 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A release can pass every automated check and still leave the approval hard to defend.&lt;/p&gt;
&lt;p&gt;That is the analytics release failure I care about here: the code merged, the pipeline stayed green, the dashboard changed, and three weeks later nobody can reconstruct why the release was considered safe. The evidence exists somewhere, but it is split across pull request comments, CI logs, screenshots, owner messages, and a rollback note that never made it into the same record.&lt;/p&gt;
&lt;p&gt;For finance-facing or operations-facing analytics releases, I want one small packet before promotion. Not a ceremony. A packet.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;The approval moment is often thinner than the change deserves.&lt;/p&gt;
&lt;p&gt;A pull request says tests passed. A dashboard screenshot sits in a thread. A finance owner writes “looks good” after checking one slice. The rollback path is known by the engineer who shipped it. Each piece may be reasonable on its own, but the release decision is still fragile because the pieces are not tied together.&lt;/p&gt;
&lt;p&gt;The cost shows up later. When a finance export, executive dashboard, or operations metric is questioned, the team has to do archaeology before it can answer the actual question: did the live number behave the way the release note said it would?&lt;/p&gt;
&lt;p&gt;That is why my default is to approve decision-critical analytics changes from an evidence packet, not from a green badge alone.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Name the release boundary: pull request, changed models, semantic definitions, dashboard cards, exports, and downstream report surfaces.&lt;/li&gt;
&lt;li&gt;Attach validation evidence that explains the business-facing change, not just the automation status.&lt;/li&gt;
&lt;li&gt;Separate engineering approval from owner approval. The delivery path and the visible metric interpretation are related, but they are not the same decision.&lt;/li&gt;
&lt;li&gt;Write the stakeholder-facing release comment before promotion: what should move, what should not move, and where readers should look if they compare against an older deck or export.&lt;/li&gt;
&lt;li&gt;Keep the rollback or recovery path boring: revert reference, validation rerun, first responder, and owner for stakeholder clarification.&lt;/li&gt;
&lt;li&gt;Link to durable evidence instead of pasting every log line into the packet.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is the cross-layer version of release control. A dashboard-specific checklist still matters, and so does a dbt deployment record or Azure DevOps check record. The packet is where I make those pieces answer one approval question.&lt;/p&gt;
&lt;h2 id=&quot;example-the-packet-i-want-in-the-release-comment&quot;&gt;Example: the packet I want in the release comment&lt;/h2&gt;
&lt;p&gt;Imagine a release that changes a finance-facing revenue dashboard. The code diff is small, but the release touches one dbt model, one semantic definition, and one dashboard card.&lt;/p&gt;
&lt;p&gt;That is enough surface area for the approval reason to get lost.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Analytics release evidence packet&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Release: revenue dashboard net revenue definition update&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;PR link: repository pull request #418&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Changed assets:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- dbt model: mart_finance_revenue&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- semantic definition: net_revenue excludes refunded order lines after settlement&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- dashboard card: executive revenue scorecard / Net revenue by week&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Validation evidence:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- CI: lint, unit tests, dbt compile, and changed-model build passed&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- dbt comparison: mart_finance_revenue row count unchanged for validation slice&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- metric comparison: net revenue moved -0.42% for 2025-01-06 to 2025-01-19, expected from settled refunds&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- dashboard screenshot diff: only Net revenue by week and Revenue mix cards changed&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- owner review: finance analytics lead approved the validation slice and release note&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Release comment:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- visible change: net revenue may decrease slightly for weeks with settled refunds&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- not changing: booked gross revenue, customer count, and order volume cards&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- stakeholder note: finance review should use the release comment if January revenue is compared to last week&apos;s deck&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Rollback / recovery:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- rollback reference: revert PR #418 and restore previous semantic definition artifact&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- recovery check: rerun dashboard validation slice after rollback&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- first responder: analytics engineering owns rollback; finance analytics lead owns stakeholder clarification&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The packet stays short because it has one job: tie the approval to durable evidence.&lt;/p&gt;
&lt;p&gt;The pull request and CI run still hold the delivery evidence. The dashboard screenshot still proves the visible output. The owner approval still belongs to the person accountable for the finance interpretation. The packet ties those records together so the approval can be reconstructed without searching five tools.&lt;/p&gt;
&lt;p&gt;The line I look for first is not the test result. It is the expected visible movement.&lt;/p&gt;
&lt;p&gt;If net revenue moves by about half a percent for the validation slice and the owner agrees that the movement comes from settled refunds, I have a release decision I can defend. If the same release has green tests but no expected movement, I still do not have enough evidence.&lt;/p&gt;
&lt;p&gt;This is where the packet differs from a BI-only checklist. For dashboard-specific output review, I still want &lt;a href=&quot;/blog/posts/bi-dashboard-release-checks/&quot;&gt;a dashboard release checklist before a BI change goes live&lt;/a&gt;. For model promotion, I still want &lt;a href=&quot;/blog/posts/dbt-core-deployments-boring-production/&quot;&gt;a boring dbt deployment record&lt;/a&gt;. For delivery-system enforcement, I still want &lt;a href=&quot;/blog/posts/azure-devops-checks-analytics-code-production/&quot;&gt;Azure DevOps checks before analytics code reaches production&lt;/a&gt;. The release packet does not replace those artifacts. It collects the approval evidence that crosses them.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: every low-risk copy edit, label change, or exploratory dashboard tweak gets the full packet → Mitigation: reserve the strict packet for metric definitions, executive dashboards, finance-facing marts, and decision-critical semantic changes.&lt;/li&gt;
&lt;li&gt;Breaks when: the packet turns into a stale template nobody reads → Mitigation: keep only evidence someone would need during rollback, dispute, or stakeholder explanation.&lt;/li&gt;
&lt;li&gt;Breaks when: links point to expiring CI output, private chat threads, or screenshots nobody can find later → Mitigation: keep the approval record in the pull request, release note, or repository-backed artifact, and link only evidence that will still be available when the number is questioned.&lt;/li&gt;
&lt;li&gt;Breaks when: the change cannot be cleanly rolled back because it includes a source correction or historical backfill → Mitigation: record the recovery path instead: affected surfaces, restatement window, validation rerun, communication owner, and first responder.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Pick one decision-critical analytics release and write the packet you would want to read three weeks later: changed assets, validation evidence, owner approval, stakeholder note, and rollback or recovery path.&lt;/p&gt;
&lt;p&gt;The approval is safer when the evidence can outlive the Slack thread around the release.&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>operations</category><category>data-reliability</category></item><item><title>The metadata-driven pipeline decisions I would revisit before moving ADF patterns into Fabric Data Factory</title><link>https://berhanturkkaynagi.com/blog/posts/metadata-driven-adf-fabric-data-factory/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/metadata-driven-adf-fabric-data-factory/</guid><description>Before moving an ADF metadata framework into Fabric Data Factory, I separate durable operating decisions from platform-specific connections, targets, schedules, and validation.</description><pubDate>Mon, 27 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The expensive version of an ADF-to-Fabric migration is the one where the control table keeps working just long enough to hide what changed.&lt;/p&gt;
&lt;p&gt;A metadata-driven ADF framework can reduce duplicated pipelines. It can also turn source selection, connection behavior, dataset shape, schedules, validation, and ownership into a second programming language. When that happens, the migration plan starts preserving the abstraction before the team has decided whether the abstraction is still honest.&lt;/p&gt;
&lt;p&gt;My thesis is simple: keep metadata for durable operating decisions, and delete or make explicit the ADF-specific indirection before rebuilding the pattern in Fabric Data Factory.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;The migration trap is treating a metadata-driven ADF framework as one reusable asset.&lt;/p&gt;
&lt;p&gt;In ADF, a team may have used one control table to drive linked services, parameterized datasets, target paths, table names, watermarks, schedules, and validation rules through a reusable copy pipeline. That pattern can be useful. It keeps repeated pipeline objects under control.&lt;/p&gt;
&lt;p&gt;It also makes some decisions too easy to miss. If &lt;code&gt;server_name&lt;/code&gt;, &lt;code&gt;database_name&lt;/code&gt;, &lt;code&gt;dataset_name&lt;/code&gt;, and &lt;code&gt;schedule_name&lt;/code&gt; are just fields in a row, the team may forget that Fabric Data Factory does not model those pieces the same way.&lt;/p&gt;
&lt;p&gt;As of my 2026-04-27 source check, Microsoft’s migration planning guidance says Fabric Data Factory defines dataset properties inline within activities, replaces linked services with Fabric connections, uses variable libraries instead of ADF global parameters, and handles scheduling differently from ADF. The same guidance says manual migration is necessary for complex environments and low-parity patterns (&lt;a href=&quot;https://learn.microsoft.com/en-us/fabric/data-factory/migrate-planning-azure-data-factory&quot;&gt;Microsoft Learn, migration planning&lt;/a&gt;, last updated 2026-04-11).&lt;/p&gt;
&lt;p&gt;That does not make metadata-driven pipelines bad. It means I would stop asking, “Can we migrate the framework?” and start asking, “Which fields still describe how we operate this data path?”&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;p&gt;My default is to turn one old ADF control-table row into a migration decision record before rebuilding anything in Fabric.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Inventory each field by job: source identity, connection behavior, dataset or path behavior, load pattern, target shape, schedule, validation, ownership, and migration status.&lt;/li&gt;
&lt;li&gt;Keep metadata that the team still needs to review deliberately: source identity, load pattern, watermark policy, validation evidence, owner, escalation path, and migration status.&lt;/li&gt;
&lt;li&gt;Move platform-specific choices into Fabric-owned artifacts or documented conventions: connections, inline activity settings, variable libraries, workspace rules, and deployment mappings.&lt;/li&gt;
&lt;li&gt;Choose the target shape before preserving the orchestration pattern. The path might land in a Lakehouse table, feed a Warehouse table, use Copy job, call a notebook for merge behavior, or stay in ADF for now.&lt;/li&gt;
&lt;li&gt;Rebuild schedules and validation as production controls, not as leftovers from the old trigger framework.&lt;/li&gt;
&lt;li&gt;Give each pipeline a migration status such as &lt;code&gt;migrate&lt;/code&gt;, &lt;code&gt;needs-review&lt;/code&gt;, &lt;code&gt;redesign-required&lt;/code&gt;, &lt;code&gt;keep-mounted-adf-for-now&lt;/code&gt;, or &lt;code&gt;delete-abstraction&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The source notes matter because this is platform behavior, not a timeless design law. As of 2026-04-27, Microsoft’s ADF upgrade page categorizes pipelines as &lt;code&gt;Ready&lt;/code&gt;, &lt;code&gt;Needs review&lt;/code&gt;, &lt;code&gt;Coming soon&lt;/code&gt;, or &lt;code&gt;Not compatible&lt;/code&gt;. It also says dynamic linked services, highly dynamic linked-service patterns, and dataset-driven metadata patterns cannot migrate as-is through the UX-based path (&lt;a href=&quot;https://learn.microsoft.com/en-us/azure/data-factory/how-to-upgrade-your-azure-data-factory-pipelines-to-fabric-data-factory&quot;&gt;Microsoft Learn, upgrade ADF pipelines to Fabric&lt;/a&gt;, last updated 2026-04-21). The comparison page frames Fabric as connection-based and dataset-free, with data properties defined inline in activities (&lt;a href=&quot;https://learn.microsoft.com/en-us/fabric/data-factory/compare-fabric-data-factory-and-azure-data-factory&quot;&gt;Microsoft Learn, ADF/Fabric differences&lt;/a&gt;, last updated 2026-03-31).&lt;/p&gt;
&lt;p&gt;So the decision is not “metadata or no metadata.” The decision is where the metadata earns its keep.&lt;/p&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;Here is the kind of ADF control-table row I would not migrate blindly.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;ADF control-table row before migration&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;source_system: erp_sql&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;linked_service_name: ls_sql_dynamic&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;server_name: erp-prod-sql-01&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;database_name: operations&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;schema_name: dbo&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;table_name: PurchaseOrders&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;dataset_name: ds_sql_table_dynamic&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;target_path: raw/erp/purchase_orders/&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;watermark_column: LastModifiedUtc&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;load_type: incremental&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;schedule_name: daily_0500_eastern&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;validation_rule: row_count_plus_watermark&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;owner_team: analytics_platform&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That row mixes four different jobs.&lt;/p&gt;
&lt;p&gt;First, it names the data path: &lt;code&gt;source_system&lt;/code&gt;, &lt;code&gt;schema_name&lt;/code&gt;, &lt;code&gt;table_name&lt;/code&gt;, and &lt;code&gt;watermark_column&lt;/code&gt; tell me what source is moving and how change is detected. Those fields still belong in a migration discussion.&lt;/p&gt;
&lt;p&gt;Second, it hides connection behavior: &lt;code&gt;linked_service_name&lt;/code&gt;, &lt;code&gt;server_name&lt;/code&gt;, and &lt;code&gt;database_name&lt;/code&gt; may have been useful ADF indirection, but I would not carry them forward as runtime-switchable strings unless the team can explain the operational reason. In Fabric, I want the governed connection reference to be visible.&lt;/p&gt;
&lt;p&gt;Third, it blurs target shape: &lt;code&gt;target_path&lt;/code&gt; says where raw data landed in the old pattern, but it does not answer whether the Fabric target is a Lakehouse table, a Warehouse table, both, or neither.&lt;/p&gt;
&lt;p&gt;Fourth, it under-specifies production ownership. &lt;code&gt;validation_rule&lt;/code&gt; and &lt;code&gt;owner_team&lt;/code&gt; are a start, but I also want the failed-run owner, source-contract owner, reporting sign-off owner, rerun note, and escalation path. Migration is where I would make those fields boring and explicit.&lt;/p&gt;
&lt;p&gt;As of 2026-04-27, Microsoft’s global-parameter migration guide says ADF global parameters need manual steps, expression references must be updated, and workspace variables should not be overloaded with run-time values (&lt;a href=&quot;https://learn.microsoft.com/en-us/fabric/data-factory/convert-global-parameters-to-variable-libraries&quot;&gt;Microsoft Learn, global parameters to variable libraries&lt;/a&gt;, last updated 2026-04-11). That is the kind of boundary I want the record to expose, not hide.&lt;/p&gt;
&lt;p&gt;The Fabric-ready record is less clever and more operational.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Fabric migration decision record&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;source_identity: erp_sql / PurchaseOrders&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;fabric_connection_reference: ops-sql-prod-connection&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;connection_strategy: explicit connection per governed source, not dynamic server/database strings&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;load_pattern: incremental copy to Lakehouse staging, with notebook or Warehouse merge only where needed&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;target_shape: Bronze Lakehouse table -&gt; curated Warehouse table if reporting needs SQL serving&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;watermark_policy: LastModifiedUtc plus replay/backfill note&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;schedule_policy: pipeline-local schedule; no reusable central trigger assumption&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;validation_evidence: row count, max watermark, schema drift check, failed-run owner, rerun note&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;owner: analytics_platform owns orchestration; source owner owns source contract; reporting owner signs off curated output&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;escalation_path: source connectivity -&gt; platform; source data change -&gt; source owner; reporting mismatch -&gt; analytics owner&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;migration_status: redesign-required because dynamic linked-service and dataset-driven pattern cannot migrate as-is&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The new record does not pretend every old field deserves a new home.&lt;/p&gt;
&lt;p&gt;It keeps &lt;code&gt;source_identity&lt;/code&gt;, &lt;code&gt;load_pattern&lt;/code&gt;, &lt;code&gt;watermark_policy&lt;/code&gt;, &lt;code&gt;validation_evidence&lt;/code&gt;, &lt;code&gt;owner&lt;/code&gt;, and &lt;code&gt;escalation_path&lt;/code&gt; because those decisions survive the platform move. It moves connection choice into an explicit governed Fabric connection reference. It makes target shape a first-class decision instead of burying the answer in a path string. It records schedule policy and migration status so no one confuses a successful assessment with production readiness.&lt;/p&gt;
&lt;p&gt;That last point matters. As of my 2026-04-27 source check, Microsoft’s upgrade guidance says post-migration work includes validating connections, re-enabling and configuring triggers, running end-to-end tests, and validating in nonproduction before production cutover (&lt;a href=&quot;https://learn.microsoft.com/en-us/azure/data-factory/how-to-upgrade-your-azure-data-factory-pipelines-to-fabric-data-factory&quot;&gt;Microsoft Learn, upgrade ADF pipelines to Fabric&lt;/a&gt;, last updated 2026-04-21). Microsoft’s global-parameter guide separately says ADF global parameters are not automatically migrated and need deliberate re-authoring into variable libraries (&lt;a href=&quot;https://learn.microsoft.com/en-us/fabric/data-factory/convert-global-parameters-to-variable-libraries&quot;&gt;Microsoft Learn, global parameters to variable libraries&lt;/a&gt;, last updated 2026-04-11). I would rather keep those checks on the decision record than treat migration as proof that the pipeline is safe.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Default:&lt;/strong&gt; Keep a control table for source identity, load pattern, watermark policy, validation evidence, owner, escalation path, and migration status. &lt;strong&gt;Breaks when:&lt;/strong&gt; the table starts storing every platform-specific knob. &lt;strong&gt;Mitigation:&lt;/strong&gt; move connection, schedule, deployment, and target-shape conventions into Fabric-owned artifacts or documented workspace rules.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Default:&lt;/strong&gt; Use explicit Fabric connections for governed sources. &lt;strong&gt;Breaks when:&lt;/strong&gt; the old ADF pattern relied on parameterized linked services to swap server, database, or authentication context at runtime. &lt;strong&gt;Mitigation:&lt;/strong&gt; create separate governed connection references and keep the source-selection decision visible in metadata instead of hiding it inside dynamic connection strings.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Default:&lt;/strong&gt; Define file, table, schema, and copy settings close to the Fabric activity that uses them. &lt;strong&gt;Breaks when:&lt;/strong&gt; the team tries to recreate ADF reusable dataset objects as a parallel metadata language. &lt;strong&gt;Mitigation:&lt;/strong&gt; keep reusable naming conventions, but let Fabric inline properties and activities own the concrete shape.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Default:&lt;/strong&gt; Use variable libraries for environment constants and pipeline parameters for run-specific values. &lt;strong&gt;Breaks when:&lt;/strong&gt; old global parameters are treated as an automatic migration. &lt;strong&gt;Mitigation:&lt;/strong&gt; record each global parameter, decide whether it is an environment constant or a runtime decision, and rewrite references deliberately.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Default:&lt;/strong&gt; Keep scheduling boring and pipeline-local unless there is a strong reason to centralize orchestration elsewhere. &lt;strong&gt;Breaks when:&lt;/strong&gt; the ADF framework depended on reusable triggers, tumbling windows, dependency triggers, or backfill semantics. &lt;strong&gt;Mitigation:&lt;/strong&gt; rebuild schedules, backfills, and trigger metadata explicitly after assessment rather than assuming parity.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Default:&lt;/strong&gt; Validate after migration with row counts, watermarks, schema checks, owner review, and a rerun note. &lt;strong&gt;Breaks when:&lt;/strong&gt; the migration assessment says a pipeline is supported and the team treats that as production proof. &lt;strong&gt;Mitigation:&lt;/strong&gt; run nonproduction end-to-end validation and keep the result in the migration decision record.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Take one metadata-driven ADF control-table row and mark every field as &lt;code&gt;keep&lt;/code&gt;, &lt;code&gt;make explicit in Fabric&lt;/code&gt;, or &lt;code&gt;delete&lt;/code&gt;; only then decide whether the pipeline should migrate, be redesigned, stay mounted for now, or be retired.&lt;/p&gt;
&lt;p&gt;If the row cannot name the owner, validation evidence, target shape, and migration status, I would fix the row before I trusted the migration plan.&lt;/p&gt;</content:encoded><category>data-pipelines</category><category>data-platforms</category><category>analytics-engineering</category></item><item><title>The checks I add to supply chain data before planners trust it</title><link>https://berhanturkkaynagi.com/blog/posts/supply-chain-data-trust-checks/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/supply-chain-data-trust-checks/</guid><description>Before planners use a supply-chain feed, I gate it with freshness, row-count, and business-rule checks for duplicates, UOMs, locations, inventory, lifecycle, demand grain, and RM/FG classification.</description><pubDate>Sun, 26 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;My first artifact for &lt;code&gt;item_location_week_supply_plan&lt;/code&gt; is a compact planner trust check card. Buy, expedite, and reallocate decisions stay blocked until every line on that card is green.&lt;/p&gt;
&lt;p&gt;I have watched a planning feed arrive on time with expected row counts and still be unsafe to use. The file can look healthy at the table level while business-rule defects point planners toward the wrong move.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;A planning feed can be fresh, complete by row count, and still unsafe for action.&lt;/p&gt;
&lt;p&gt;I keep this boundary separate from two other supply-chain posts on purpose:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;KPI-definition boundary:&lt;/strong&gt; in &lt;a href=&quot;/blog/posts/supply-chain-numbers-business-review/&quot;&gt;The supply chain numbers I define before they reach a business review&lt;/a&gt;, I decide metric meaning, timing, exclusions, grain, and owner before a review.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Restatement-window boundary:&lt;/strong&gt; in &lt;a href=&quot;/blog/posts/late-arriving-supply-chain-data/&quot;&gt;Handling late-arriving supply chain data without rewriting history by hand&lt;/a&gt;, I decide how late events restate history inside a declared correction window.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this post, I decide whether the planning slice is safe enough for planners to act on &lt;strong&gt;right now&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;p&gt;I run a check ladder in a fixed order.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;First gate: freshness and row counts. I use them as entry gates only. They are necessary, but they are explicitly insufficient for planner trust. This is the same premise I use in &lt;a href=&quot;/blog/posts/row-counts-are-not-enough/&quot;&gt;Row counts are not enough: the checks I add before I trust a pipeline&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Then I run seven business-rule categories before I clear planning actions:
&lt;ol&gt;
&lt;li&gt;Duplicate purchase orders&lt;/li&gt;
&lt;li&gt;UOM conversion integrity&lt;/li&gt;
&lt;li&gt;Missing location mappings&lt;/li&gt;
&lt;li&gt;Negative or impossible inventory states&lt;/li&gt;
&lt;li&gt;Order-status lifecycle validity&lt;/li&gt;
&lt;li&gt;Forecast vs actual demand grain alignment&lt;/li&gt;
&lt;li&gt;Raw material vs finished-goods classification integrity&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;For every failed category, I attach one blocked action and one first-response owner so planners are not left guessing whether the slice can be used.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;In one weekly slice of &lt;code&gt;item_location_week_supply_plan&lt;/code&gt;, &lt;code&gt;FG-104&lt;/code&gt; at &lt;code&gt;WEST-03&lt;/code&gt; looked healthy at the table level (&lt;code&gt;freshness=PASS&lt;/code&gt;, &lt;code&gt;row_count=PASS&lt;/code&gt;) and still failed planner trust.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Planner trust check card&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Feed: item_location_week_supply_plan&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Slice: FG-104 @ WEST-03&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Entry gates: freshness=PASS, row_count=PASS (insufficient alone)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;1) Duplicate purchase orders&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Trigger condition: two active lines share (po_number, line_id, item_id, location_id, required_date) after an EDI 850 resend lands with a new message_id&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Failure description: the same inbound case appears twice and inflates expected receipts&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Planner decision error it causes: planner skips a needed expedite because inbound looks bigger than reality&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- First response: procurement data owner inactivates the duplicate line and republishes the slice&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;2) UOM conversion integrity&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Trigger condition: supplier EDI 856 arrives in CASE, ERP posts receipts in EA, and case-pack mapping is stale for FG-104&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Failure description: each receipt is multiplied by the old case-pack value and creates phantom on-hand overnight&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Planner decision error it causes: planner delays replenishment because phantom safety stock hides a true shortage&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- First response: master-data/UOM owner corrects CASE→EA mapping and reruns the planning model&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;3) Missing location mappings&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Trigger condition: WMS emits alias location `W03-RCV`, but canonical mapping to `WEST-03` is missing in the hierarchy bridge&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Failure description: receipts and on-hand fall into an unmapped bucket and disappear from the WEST-03 planning view&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Planner decision error it causes: planner launches an unnecessary inter-DC reallocation for a shortage that is only a mapping miss&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- First response: location-master owner backfills the alias mapping and revalidates affected keys&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;4) Negative or impossible inventory states&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Trigger condition: FG available quantity goes negative without an approved reason code after issue transactions post before in-transit settlement closes&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Failure description: timing-gap negatives are mixed with true data defects, so FG-104 shows an impossible shortage state&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Planner decision error it causes: planner triggers the wrong expedite instead of waiting for the settlement window close&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- First response: inventory-control owner verifies transaction chain and reason codes before release&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;5) Order-status lifecycle validity&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Trigger condition: order cancelled in OMS, but the cancellation event is dropped in transit and the line stays OPEN in planning facts&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Failure description: the cancelled line sits permanently in the open bucket and inflates expected backlog&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Planner decision error it causes: planner over-allocates capacity and commits an unnecessary expedite to chase phantom backlog&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- First response: order-management owner replays cancellation events, reconciles status snapshots, and re-emits order facts&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;6) Forecast vs actual demand grain alignment&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Trigger condition: forecast is uploaded at SKU-region grain when replenishment consumes SKU-DC grain&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Failure description: missing DC rows are treated as zero demand, triggering unnecessary safety-stock recommendations&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Planner decision error it causes: planner launches wrong replenishment transfers into DCs that are not actually under-demand&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- First response: planning analytics owner disaggregates SKU-region forecast to SKU-DC with the approved split key before recomputing joins&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;7) Raw material vs finished-goods classification integrity&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Trigger condition: RM-881 is misclassified as FG during a product-master merge and enters FG availability logic without a transformation rule&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Failure description: raw material is counted as sellable finished stock for FG-104&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Planner decision error it causes: planner misses a real finished-goods shortage and delays the buy signal&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- First response: product-classification owner restores RM/FG boundary and reruns publish checks&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Blocked planning actions while any line is FAIL:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Buy: blocked&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Expedite: blocked&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Reallocate: blocked&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This card keeps me out of false confidence: the feed can look healthy at the table level and still be unsafe for planner decisions.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: I assert &lt;code&gt;qty_on_hand &gt;= 0&lt;/code&gt; before the goods-in-transit settlement window closes, so legitimate shipment-before-receipt timing gaps look like defects → Mitigation: I apply strict non-negative assertions only after settlement close, and I require reason codes inside the open window.&lt;/li&gt;
&lt;li&gt;Breaks when: I run duplicate-PO checks without lifecycle context during ERP reopen/reclose cycles, so legitimately reopened lines are flagged as duplicates → Mitigation: I scope duplicate checks to active statuses plus lifecycle transition rules from the order-policy contract.&lt;/li&gt;
&lt;li&gt;Breaks when: I compare forecast and actual demand before SKU-region forecasts are disaggregated to SKU-DC, so missing DC rows are interpreted as zero demand → Mitigation: I enforce disaggregation and completeness checks at SKU-DC grain before replenishment logic runs.&lt;/li&gt;
&lt;li&gt;Breaks when: I trust status snapshots only and ignore event-stream loss during OMS→planning transport, so dropped cancellation events create phantom open backlog → Mitigation: I run event-vs-snapshot reconciliation and block planner release on unresolved cancellation gaps.&lt;/li&gt;
&lt;li&gt;Breaks when: I treat RM/FG product_type as static during new-item onboarding, so temporary classification drift enters FG ATP logic → Mitigation: I gate FG availability on explicit transformation rules and quarantine UNKNOWN/RM classes from sellable stock.&lt;/li&gt;
&lt;li&gt;Breaks when: I leave first-response ownership generic across plants and shifts, so failed checks sit unclaimed through weekend planning cycles → Mitigation: I assign a named owner + response SLA per category and route alerts by calendar coverage.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Publish one planner trust check card for your highest-risk planning feed this week, and keep buy/expedite/reallocate blocked until every line on the card is green.&lt;/p&gt;
&lt;p&gt;For a supply-chain anomaly already under debate, I’m happy to compare notes on the category, blocked-action, and owner boundary that keeps a bad planning slice from becoming a bad planner decision.&lt;/p&gt;</content:encoded><category>supply-chain</category><category>data-quality</category><category>operations</category></item><item><title>Handling late-arriving supply chain data without rewriting history by hand</title><link>https://berhanturkkaynagi.com/blog/posts/late-arriving-supply-chain-data/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/late-arriving-supply-chain-data/</guid><description>For late ASN and shipment confirmations, I treat weekly OTIF as preliminary inside a restatement window so routine corrections settle without manual report patching.</description><pubDate>Sat, 25 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;On Monday morning, I opened the weekly supply-chain review deck and watched last week’s OTIF move before the meeting started.&lt;/p&gt;
&lt;p&gt;Supplier North sent a shipment confirmation two days late. Friday’s OTIF was published from the data available at close. Monday’s refresh pulled in the new event and the result moved. That is when a team either trusts a declared restatement contract or starts rewriting numbers by hand.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;In supply-chain systems, late ASN and shipment events are normal, but many review workflows still treat the first weekly value as final.&lt;/p&gt;
&lt;p&gt;When that assumption wins, every late confirmation becomes manual repair: patch the dashboard, explain the change in chat, then repeat the patch next week. The metric still moves, but the movement is hidden in ad hoc edits instead of a declared restatement policy.&lt;/p&gt;
&lt;p&gt;I already define KPI boundaries before review in &lt;a href=&quot;/blog/posts/supply-chain-numbers-business-review/&quot;&gt;The supply chain numbers I define before they reach a business review&lt;/a&gt;. The next boundary I set is restatement: when a late event is allowed to move OTIF, and when the period becomes settled.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;p&gt;I use one platform-agnostic restatement rule: rerun a fixed lookback window on every refresh and label the values inside that window as preliminary.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Keep OTIF tied to one event contract: late ASN, late shipment confirmation, or late POD/receipt update can restate recent periods.&lt;/li&gt;
&lt;li&gt;Recompute a fixed 14-day window on each run using &lt;code&gt;loaded_at&lt;/code&gt; as the restatement watermark.&lt;/li&gt;
&lt;li&gt;Mark every OTIF value inside the window as &lt;code&gt;preliminary_within_window&lt;/code&gt;; mark values outside the window as &lt;code&gt;settled_after_window&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Publish the same labels to consumers so planners know exactly when a number can still move.&lt;/li&gt;
&lt;li&gt;Keep inventory snapshot settlement as a separate policy surface from transactional late events; do not blend them into one rule.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is not a historical backfill-validation workflow like &lt;a href=&quot;/blog/posts/metric-definition-survives-backfill/&quot;&gt;How I validate a metric after a backfill&lt;/a&gt;. It is the day-to-day restatement contract for routine late-arriving corrections.&lt;/p&gt;
&lt;h2 id=&quot;example-the-restatement-window-policy-card-i-run&quot;&gt;Example: the restatement window policy card I run&lt;/h2&gt;
&lt;p&gt;Last week, Supplier North OTIF first landed at 91% on Friday close. A shipment confirmation arrived Sunday night with &lt;code&gt;loaded_at = 2026-04-19 22:14:00&lt;/code&gt;, tied to orders shipped in the prior week. Monday’s run recomputed the 14-day window and that week’s OTIF moved to 88%.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Restatement window policy card&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Metric: supplier_otif&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Window duration: 14 days&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Trigger event: late ASN, late shipment confirmation, late POD/receipt update&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Preliminary label: preliminary_within_window&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Settled label: settled_after_window&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Before&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Weekly OTIF was frozen at first publish.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Late confirmations triggered manual dashboard edits and one-off SQL updates.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;After&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Every refresh reruns the latest 14-day window using loaded_at watermark.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Supplier North late confirmations restate OTIF automatically inside the window.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Periods older than 14 days stay settled unless a deliberate backfill is approved.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The behavior stays visible and predictable: last week’s OTIF can move when the supplier event arrives late, and the movement happens through the declared window, not analyst cleanup.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: the lookback window is shorter than real supplier latency patterns → Mitigation: set the window from observed late-arrival distribution and review it quarterly.&lt;/li&gt;
&lt;li&gt;Breaks when: downstream users read preliminary values as final → Mitigation: surface &lt;code&gt;preliminary_within_window&lt;/code&gt; and &lt;code&gt;settled_after_window&lt;/code&gt; in the same table used in the review.&lt;/li&gt;
&lt;li&gt;Breaks when: teams use this policy to hide structural source issues → Mitigation: keep planner-trust quality assertions separate and escalate recurring source defects as their own workstream.&lt;/li&gt;
&lt;li&gt;Breaks when: transactional events and inventory snapshots share one restatement rule → Mitigation: maintain independent settlement policies so each surface reflects its own timing behavior.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Pick one supplier-facing KPI this week, publish its restatement window policy card, and mark which periods are still preliminary before the next business review.&lt;/p&gt;
&lt;p&gt;If you want to compare latency patterns and choose a window that your planners will trust, bring one recent late-arrival example and we can pressure-test the policy together.&lt;/p&gt;</content:encoded><category>supply-chain</category><category>data-reliability</category><category>backfills</category></item><item><title>The supply chain numbers I define before they reach a business review</title><link>https://berhanturkkaynagi.com/blog/posts/supply-chain-numbers-business-review/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/supply-chain-numbers-business-review/</guid><description>Before supply-chain KPIs reach a business review, I define timing, exclusions, grain, owner, and decision use so one label does not hide planning, warehouse, and finance conflicts.</description><pubDate>Fri, 24 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A weekly supply-chain business review can burn an hour arguing over which version of fill rate is right and still leave without deciding whether service, fulfillment, or working capital needs action this week.&lt;/p&gt;
&lt;p&gt;Fill rate, on-time delivery, and inventory turns sound like shared language until planning, warehouse operations, and finance each carry their own clock, grain, and exclusions into the room under the same labels. Once the deck shows one number per label, the meeting becomes a dashboard debate instead of a decision.&lt;/p&gt;
&lt;p&gt;I would rather define each KPI’s boundary before the review than untangle it in the meeting. If one supply-chain label cannot hold the same owner, clock, grain, exclusions, and intended decision for planning, warehouse, and finance at once, I split it and name the variants before anyone presents a number.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Supply-chain KPI labels often look stable until teams ask what they include.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Fill rate&lt;/code&gt; can mean complete orders filled from available stock, order lines filled, units shipped within a short window, or demand covered after backorder recovery. &lt;code&gt;On-time delivery&lt;/code&gt; can be measured against requested date, original promise date, latest customer-accepted promise date, planned ship date, actual ship timestamp, or actual delivery timestamp. &lt;code&gt;Inventory turns&lt;/code&gt; can be a finance-owned value calculation for a fiscal period or an operational shortcut for SKU movement through a warehouse.&lt;/p&gt;
&lt;p&gt;These are not cosmetic wording gaps. Each one changes who owns the number, which clock it runs on, and which decision the review is allowed to make with it.&lt;/p&gt;
&lt;p&gt;The pattern I watch for is a review deck with three headline KPIs and no boundary card behind them. Planning reads &lt;code&gt;fill rate&lt;/code&gt; as demand coverage. Warehouse operations reads the same label as complete customer orders filled from stock on the first execution attempt. Finance reads it as a service signal that should support or contradict the working-capital story. The label is shared; the definitions underneath are not, and the decisions they imply do not overlap.&lt;/p&gt;
&lt;p&gt;That is the same failure shape I watch for in &lt;a href=&quot;/blog/posts/stop-metric-drift-across-dashboards/&quot;&gt;metric drift across dashboards&lt;/a&gt;. The supply-chain version is harder to catch because the disagreement usually starts before any dashboard exists, inside the definitions teams carry into the meeting.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;p&gt;Before the number enters the review deck, I write a short KPI boundary card.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Start with the review decision, not the formula. The card should state, in one line, the question this metric is allowed to answer in this meeting and who gets to act on the answer.&lt;/li&gt;
&lt;li&gt;Lock the clock. Order date, requested date, original promise date, latest customer-accepted promise date, planned ship date, actual ship timestamp, actual delivery timestamp, data-complete cutoff, and fiscal period are different clocks, and the card picks one on purpose.&lt;/li&gt;
&lt;li&gt;Write exclusions in business language. Cancelled orders, customer reschedules, substitutions, backorders, partial shipments, carrier exceptions, returns, obsolete inventory, and consignment stock each need an explicit keep-or-drop decision, not a buried SQL filter.&lt;/li&gt;
&lt;li&gt;Name the grain before aggregation. Order, order line, unit, shipment, SKU-location-day, and fiscal-period inventory value each answer a different question, and averaging them silently is how a supposedly shared metric drifts.&lt;/li&gt;
&lt;li&gt;Assign one named owner who can approve changes to timing, exclusions, or grain without having to ask another team first.&lt;/li&gt;
&lt;li&gt;Decide up front whether the review gets one shared definition or separately named variants, and record the decision on the card. Do not defer that call to the meeting.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The last bullet carries the most weight. Splitting a KPI is not a naming exercise; it is an admission that one shared label would otherwise make planning, warehouse, or finance wrong in order to make another team precise.&lt;/p&gt;
&lt;h2 id=&quot;example-the-boundary-card-i-want-before-the-review&quot;&gt;Example: the boundary card I want before the review&lt;/h2&gt;
&lt;p&gt;Here is a weekly supply-chain business review I would rather catch before it ships than defend after.&lt;/p&gt;
&lt;p&gt;The service-and-inventory slide currently reads:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Weekly supply-chain business review&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- fill rate: 92%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- on-time delivery: 89%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- inventory turns: 5.1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The disagreement starts before the slide is up.&lt;/p&gt;
&lt;p&gt;Planning puts fill rate at 96%, counting units eventually shipped against the demand created that week. Warehouse operations puts it at 88%: complete customer orders filled from available stock on the first execution attempt, with cancellations and backordered lines excluded. Finance wants neither version anywhere near the close, because partial shipments and backorder recovery push revenue across period boundaries.&lt;/p&gt;
&lt;p&gt;On-time delivery repeats the shape. Transportation reports 91% against the latest customer-accepted promise date and the actual delivery timestamp. Warehouse operations prefers 95%, measured against planned ship date and actual ship timestamp, and stops the clock there. Sales asks why an order that missed the original promise date still counts as on time after a customer reschedule quietly rewrote the target.&lt;/p&gt;
&lt;p&gt;Inventory turns behaves like one number until someone asks what the denominator is. Finance is using finished-goods inventory value averaged across the completed fiscal period with COGS for the same period. Planning is watching SKU velocity and days of supply. Warehouse operations is watching movement and capacity pressure against a physical footprint. All three views answer real questions; none of them are the same metric, and none of them should inherit the single label &lt;code&gt;inventory turns&lt;/code&gt; on an executive slide.&lt;/p&gt;
&lt;p&gt;Here is the boundary card I would write before the next deck goes out, with one owner and one intended decision per approved metric:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Supply-chain KPI boundary card&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Review: weekly supply-chain business review&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Decision: which numbers are safe as shared review KPIs, and which need named variants&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;1. fill rate&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- review label: fill rate&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- approved review metric: warehouse_order_fill_rate&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- intended decision: did fulfillment fill customer orders completely from available stock on the first execution attempt?&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- owner: warehouse operations / fulfillment analytics&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- timing: first ship attempt or agreed ship-from-stock window&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- grain: order-level complete fill unless the card explicitly says order-line or unit fill&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- exclusions to state: cancellations, customer future-dated orders, substitutions, split shipments, manual holds, test orders, backorders&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- split decision: planning gets a separate supply-availability metric; finance does not use this number for close or working-capital review&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;2. on-time delivery&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- review label: on-time delivery&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- approved review metric: customer_on_time_delivery&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- intended decision: did delivered orders meet the agreed customer promise window?&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- owner: transportation / logistics, with customer-service ownership for promise-date policy&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- timing: promise date source and actual delivery timestamp are locked before review&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- grain: complete order unless the card explicitly says shipment or order line&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- exclusions to state: customer reschedules, carrier exceptions, missing proof of delivery, cancelled orders, pickup orders, early-but-incomplete deliveries&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- split decision: warehouse ship timeliness and customer delivery performance stay separate if they use different clocks&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;3. inventory turns&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- review label: inventory turns&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- approved review metric: finance_finished_goods_inventory_turns&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- intended decision: is finished-goods working capital moving in the right direction for the completed fiscal period?&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- owner: finance / supply-chain finance&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- timing: completed fiscal month, quarter, or year; numerator and denominator use the same period&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- grain: financial value over time, not unit movement&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- exclusions to state: obsolete/excess inventory, consignment stock, intercompany inventory, raw/WIP inventory, returns, write-downs, cost policy&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- split decision: planning gets days of supply or SKU velocity; warehouse gets movement or capacity metrics; finance owns official turns&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Business-review rule&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- If owner, clock, grain, exclusions, and decision do not match across planning, warehouse, and finance, the deck shows named variants instead of one shared KPI label.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The resolution is not to average the versions or pick the dashboard backed by the most confident owner.&lt;/p&gt;
&lt;p&gt;The resolution is to split the label and assign each variant to the audience and decision it serves. Planning keeps a named supply-availability or service metric for demand coverage. Warehouse operations owns &lt;code&gt;warehouse_order_fill_rate&lt;/code&gt; for complete first-attempt fulfillment. Transportation and customer service together own &lt;code&gt;customer_on_time_delivery&lt;/code&gt; for the promise-window question, with customer service holding the right to change promise-date policy. Finance owns &lt;code&gt;finance_finished_goods_inventory_turns&lt;/code&gt; for finished-goods working capital across the completed fiscal period.&lt;/p&gt;
&lt;p&gt;Once those names exist, the review can still show more than one number. The difference is that each number states which decision it supports and which owner can change the definition without a cross-team negotiation in the meeting.&lt;/p&gt;
&lt;p&gt;This is also why I do not treat the problem as a self-service dashboard issue. A dashboard can only travel safely after the metric boundary is honest. Before a supply-chain number reaches the review, I apply the same caution I use before I &lt;a href=&quot;/blog/posts/dashboard-self-service-checks/&quot;&gt;label a dashboard self-service&lt;/a&gt;: write the allowed question, the intended audience, and the explicit not-for cases, then decide whether the single shared label still fits at all.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: one KPI label is forced to serve planning, warehouse, and finance decisions → Mitigation: split the metric into separately named definitions with the owner and intended decision on the card.&lt;/li&gt;
&lt;li&gt;Breaks when: the formula is written but the clock is not → Mitigation: define the timing source before review and block after-the-fact swaps between order date, ship date, delivery date, promise date, and fiscal period.&lt;/li&gt;
&lt;li&gt;Breaks when: partial shipments make the numerator look better for one audience and worse for another → Mitigation: choose order, order-line, unit, or shipment grain deliberately and keep partial-shipment handling visible.&lt;/li&gt;
&lt;li&gt;Breaks when: inventory turns are treated like a warehouse velocity metric in one slide and a finance working-capital metric in the next → Mitigation: reserve official turns for finance-owned value over a fiscal period, then create separate planning or warehouse operational metrics when needed.&lt;/li&gt;
&lt;li&gt;Breaks when: the boundary card turns into a governance ceremony for every low-risk operational number → Mitigation: reserve the full card for numbers that reach cross-functional reviews or executive summaries.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; For one supply-chain number already in a business review, write the boundary card before the next meeting: owner, clock, grain, exclusions, and the first decision it supports.&lt;/p&gt;
&lt;p&gt;Named variants usually create calmer reviews than one overloaded KPI label carrying planning, warehouse, and finance decisions at once.&lt;/p&gt;</content:encoded><category>supply-chain</category><category>metrics</category><category>operations</category></item><item><title>Azure DevOps checks before analytics code reaches production</title><link>https://berhanturkkaynagi.com/blog/posts/azure-devops-checks-analytics-code-production/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/azure-devops-checks-analytics-code-production/</guid><description>For analytics repos in Azure DevOps, I want one pre-production record that shows PR validation, dbt compile, changed-scope checks, published evidence, and environment approval before promotion.</description><pubDate>Fri, 24 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The Azure DevOps setup I care about for analytics repos is not the cleverest YAML file.&lt;/p&gt;
&lt;p&gt;It is the record that lets me say whether this change can move closer to production.&lt;/p&gt;
&lt;p&gt;Before a metric definition, model, or dashboard-facing transform gets promoted, I want the delivery decision visible in one place: the protected-branch PR gate, the short validation chain, the evidence from failed checks or risky changes, and the approval boundary for the next environment.&lt;/p&gt;
&lt;p&gt;If I need a Slack thread to reconstruct those facts, the pipeline is still hiding the decision.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Analytics code can fail with the same shape as application code. A small pull request changes one definition, the code review looks harmless, and the number that moves later is the one finance or operations cares about.&lt;/p&gt;
&lt;p&gt;The weak spot is usually not that Azure DevOps cannot run enough checks. It is that the delivery record is too thin. A PR has a green badge, a reviewer remembers that tests usually run, and a promotion stage succeeds because the previous stage looked healthy. Later, when net revenue changes, nobody can tell whether the protected-branch policy guarded the merge, whether &lt;code&gt;dbt compile&lt;/code&gt; ran on the changed scope, or whether a red run would have published enough evidence for the first responder.&lt;/p&gt;
&lt;p&gt;That is the gap I want Azure DevOps to close: not a platform tour, not a YAML showcase, just one inspectable chain from pull request to promotion boundary.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;p&gt;This post uses the Azure Repos Git branch-policy and Azure Pipelines behavior I checked in Microsoft Learn on 2026-03-24.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Protect the production branch with Azure Repos Git branch policies and required build validation. For Azure Repos Git, that branch-policy build validation is the PR gate I trust; push-trigger CI is supporting evidence, not the merge boundary.&lt;/li&gt;
&lt;li&gt;Keep PR validation short enough to debug: lint, repo tests, &lt;code&gt;dbt compile&lt;/code&gt;, and one changed-model or changed-path check tied to the repo’s real risk.&lt;/li&gt;
&lt;li&gt;Publish test results into the pipeline summary so a failed run names the broken check before anyone opens raw logs.&lt;/li&gt;
&lt;li&gt;Publish the smallest useful investigation artifacts from validation: compile output, changed-node or changed-path evidence, failed SQL snippets when present, and a release note that names the rollback reference.&lt;/li&gt;
&lt;li&gt;Use stage conditions deliberately. Validation, evidence publication, and promotion should be separate decisions, and a promotion stage that depends on validation should preserve that success requirement.&lt;/li&gt;
&lt;li&gt;Put production-adjacent promotion behind an environment approval or check controlled outside ordinary pipeline edits.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That is the job I want Azure DevOps doing here. Enforce the delivery decision. Preserve the evidence. Do not become a second analytics system.&lt;/p&gt;
&lt;h2 id=&quot;example-the-pre-production-check-record-i-want-attached-to-the-pr&quot;&gt;Example: the pre-production check record I want attached to the PR&lt;/h2&gt;
&lt;p&gt;Imagine a pull request that changes the net revenue definition used by a finance-facing mart. The diff is small: one calculation changes how refunds are excluded from revenue after a source-system fix.&lt;/p&gt;
&lt;p&gt;That is exactly the sort of analytics change I do not want waved through because the pull request is short.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Azure DevOps pre-production check record&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Change: metric definition update for net revenue&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;PR: #418 -&gt; main&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Protected branch policy: required build validation passed&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Validation trigger model: Azure Repos Git branch-policy build validation, not only a generic CI push trigger&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Validation stage&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- branch validation: required build validation passed on the protected branch&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- lint: passed&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- repo tests: passed&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- dbt compile: passed&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- changed-scope check: changed models and changed paths reviewed&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Published evidence&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- test results: visible in the pipeline summary&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- investigation artifacts: compile output, changed-node evidence, failed-query snippets if present&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- release note: metric definition changed; rollback path points to previous definition commit and artifact&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Promotion boundary&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- preproduction stage condition: only after validation succeeds&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- environment approval/check: approved before production-adjacent promotion&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- decision: merge and promotion allowed because checks, evidence, and approval are visible in one record&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first line I check is the branch boundary. A pipeline that happens to run on pushes is not the same thing as protected PR validation. I want the protected branch to require build validation before completion, so the merge decision is tied to the policy that guards the branch.&lt;/p&gt;
&lt;p&gt;The validation stage stays intentionally boring. &lt;code&gt;lint&lt;/code&gt; catches formatting and static mistakes. Repo tests catch local behavior. &lt;code&gt;dbt compile&lt;/code&gt; proves the project can still parse, resolve references, and render the changed graph. The changed-model or changed-path check keeps a one-line metric edit from looking risk-free.&lt;/p&gt;
&lt;p&gt;I am not trying to turn this post into a dbt deployment guide. The deployment mechanics belong in &lt;a href=&quot;/blog/posts/dbt-core-deployments-boring-production/&quot;&gt;How I keep dbt Core deployments boring in production&lt;/a&gt;. Here, &lt;code&gt;dbt compile&lt;/code&gt; and changed-scope evidence earn their slot because they are gates in the Azure DevOps record.&lt;/p&gt;
&lt;p&gt;Published evidence is where a lot of otherwise decent pipelines get thin. A red run should not leave the next person guessing which test failed, where the compile output went, or whether the changed-scope check found a wider path than expected. I want that evidence attached to the run while the context is still fresh.&lt;/p&gt;
&lt;p&gt;The promotion boundary is separate on purpose. Validation steps live in the pipeline. The production-adjacent approval or check should be owned as a protected resource decision, not hidden inside the same editable YAML path as ordinary validation. That boundary keeps &lt;code&gt;the run was green&lt;/code&gt; from silently turning into &lt;code&gt;production can move automatically&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The same logic applies when the risky part of the change is the model-level test coverage. &lt;a href=&quot;/blog/posts/dbt-tests-business-critical-models/&quot;&gt;The dbt tests I write first for business-critical models&lt;/a&gt; explains how I choose those tests. This record explains where Azure DevOps should enforce those tests, publish their results, and stop promotion until the evidence is visible.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: the team relies on a generic push trigger and calls it PR protection → Mitigation: anchor Azure Repos Git PR validation to branch-policy build validation on the protected branch, then treat CI triggers as supporting context.&lt;/li&gt;
&lt;li&gt;Breaks when: the YAML becomes more impressive than the release decision → Mitigation: keep validation stages short and tie every step to merge safety, failure investigation, or promotion control.&lt;/li&gt;
&lt;li&gt;Breaks when: test failures only exist in raw logs → Mitigation: publish test results and the smallest useful investigation artifacts so the first responder can see the failed check and next action quickly.&lt;/li&gt;
&lt;li&gt;Breaks when: stage conditions skip too much or run after failed prerequisites → Mitigation: keep success conditions explicit and preserve &lt;code&gt;succeeded()&lt;/code&gt; where the prior stage must pass.&lt;/li&gt;
&lt;li&gt;Breaks when: environment approvals live inside the same editable pipeline logic as ordinary validation steps → Mitigation: use approvals and checks on protected resources so resource owners control the promotion boundary outside YAML edits.&lt;/li&gt;
&lt;li&gt;Breaks when: &lt;code&gt;dbt compile&lt;/code&gt; or changed-model checks pull the conversation back into dbt deployment mechanics → Mitigation: keep them framed as gates inside the Azure DevOps record, then link out to the dbt deployment and dbt test-ordering posts for mechanics.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Pick one analytics repo and write the check record you would want attached to the next metric definition change before it merges: branch-policy validation, lint, tests, compile, changed scope, published evidence, and environment approval.&lt;/p&gt;
&lt;p&gt;If a release still needs Slack archaeology after a green Azure DevOps run, use that evidence record to make the next merge explain itself from policy to approval.&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>data-pipelines</category><category>operations</category></item><item><title>What I check before I label a dashboard self-service</title><link>https://berhanturkkaynagi.com/blog/posts/dashboard-self-service-checks/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/dashboard-self-service-checks/</guid><description>I call a dashboard self-service only when a review card makes its question, audience, grain, filters, timing, and “not for” cases visible before broader reuse.</description><pubDate>Thu, 23 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A dashboard is not self-service just because a lot of people can open it.&lt;/p&gt;
&lt;p&gt;I use that label only when the promise survives casual reuse. If a forwarded link can pull a new reader past the intended question, audience, grain, filters, or timing boundary, the label has outrun the dashboard.&lt;/p&gt;
&lt;p&gt;I do not blame the finance reviewer for opening a familiar operations view. I blame the label when it travels farther than the logic underneath it.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Some dashboards are reliable for the team that built them and still unsafe as self-service assets.&lt;/p&gt;
&lt;p&gt;I see this when an operations dashboard gets reused in finance close because the chart names look familiar and the link is easy to share. One tile mixes booked orders and shipped orders. Cancelled-order handling and sandbox exclusions stay buried in SQL. The refresh is still incomplete before &lt;code&gt;09:00 ET&lt;/code&gt;. Operations can still use it to spot same-day flow problems. Finance cannot use it to decide whether revenue is safe to close.&lt;/p&gt;
&lt;p&gt;That is not a careless-reader problem. It is a boundary problem. A dashboard earns the self-service label only when a casual reader can tell what question it answers, who it serves, what each number counts, which filters shape the answer, and when the data is complete enough to trust. If those boundaries are still implied, the honest label is team-scoped, split into another view, or simply not self-service yet.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;p&gt;I treat &lt;code&gt;self-service&lt;/code&gt; as a label decision, not a compliment. I want one short self-service review card beside the dashboard that tells a new reader what I allow, what I block, and where the label ends.&lt;/p&gt;
&lt;h3 id=&quot;what-i-allow-before-i-use-the-label&quot;&gt;What I allow before I use the label&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;One allowed question, written the way the operator would ask it. In this case: &lt;code&gt;Where is same-day order flow falling behind plan by ship-promise date?&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;One named audience. If the default reader is &lt;code&gt;operations managers and fulfillment leads&lt;/code&gt;, I write that down instead of implying &lt;code&gt;anyone with access&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Plain grain on every important tile: booked, shipped, or settled; one row, one order, or one shipment.&lt;/li&gt;
&lt;li&gt;Visible filter boundaries, including exclusions that would change the answer if a reader assumed the wrong scope.&lt;/li&gt;
&lt;li&gt;A timing note that says when the number is directional, when it is complete enough for the intended audience, and when it is unsafe for wider reuse.&lt;/li&gt;
&lt;li&gt;One explicit label outcome at the end: self-service now, split the view first, or keep it team-scoped.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;what-i-block-or-relabel&quot;&gt;What I block or relabel&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;I block the label when one dashboard is asked to answer both an operations follow-up and a finance close question. That is not broader reuse. That is two decision paths hiding in one view.&lt;/li&gt;
&lt;li&gt;I relabel when the audience boundary is effectively &lt;code&gt;whoever got the link&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;I stop the label when a tile mixes booked, shipped, and settled logic without naming the difference.&lt;/li&gt;
&lt;li&gt;I stop the label when the important filters only exist in SQL comments, dbt models, or someone’s explanation on a call.&lt;/li&gt;
&lt;li&gt;I keep the dashboard out of finance or executive reuse when the morning refresh is still incomplete but the page only says &lt;code&gt;updated daily&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If question, audience, grain, filters, and timing do not line up for the next reader, I do not widen the label. If the dashboard only fits the original team, I would rather say that plainly than pretend the word &lt;code&gt;self-service&lt;/code&gt; will teach the next reader where the boundary is.&lt;/p&gt;
&lt;h2 id=&quot;example-the-self-service-review-card-i-want-beside-the-dashboard&quot;&gt;Example: the self-service review card I want beside the dashboard&lt;/h2&gt;
&lt;p&gt;Here is the compact review card I would attach to a dashboard before I let anyone call it self-service:&lt;/p&gt;
&lt;p&gt;Dashboard self-service review&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;
















&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Field&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Dashboard&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;daily order flow monitor&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Label decision&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;FAIL — not self-service yet&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;







































&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Review step&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;What I confirm&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Status&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Question&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Allowed question: Where is same-day order flow falling behind plan by ship-promise date?
Current failure: finance is also using the dashboard to ask whether settled daily revenue is safe to close.&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;—&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Audience&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Intended audience: operations managers and fulfillment leads
Explicit not for: finance close, revenue reporting, external reporting&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;—&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Grain&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Current failure: one trend mixes booked orders and shipped orders in the same daily view
Pass condition: each tile names whether it is booked, shipped, or settled, and what one row / point represents&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;—&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Filter boundaries&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Must state location scope, cancelled-order handling, sandbox and test exclusions, and return handling
Current failure: those exclusions live in SQL but are invisible to the reader&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;—&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Timing assumptions&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Current data is intra-day and incomplete before 09:00 ET
Pass condition: the dashboard states when the number is directional, when it is complete enough for operations, and when it is not safe for finance&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;—&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Outcome&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Keep this dashboard team-scoped for operations, or split finance into a separate settled view before using the self-service label&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;—&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The question line does most of the work. &lt;code&gt;Where is same-day order flow falling behind plan by ship-promise date?&lt;/code&gt; is an operations question. &lt;code&gt;Can finance use this for settled daily revenue close?&lt;/code&gt; is a different question, with different timing needs and different counting rules.&lt;/p&gt;
&lt;p&gt;The audience line keeps the dashboard from pretending it is safe for every reader who can open it. Once I write &lt;code&gt;operations managers and fulfillment leads&lt;/code&gt; and add &lt;code&gt;not for finance close&lt;/code&gt;, I stop treating access as proof that the view is broadly reusable.&lt;/p&gt;
&lt;p&gt;The grain and filter lines keep a new reader from reverse-engineering the dashboard from memory or tribal knowledge. If a tile mixes booked and shipped logic, or if cancelled-order handling and test exclusions only live in SQL, the dashboard is still depending on insider context.&lt;/p&gt;
&lt;p&gt;The timing line decides whether the label is honest. If the number is incomplete before &lt;code&gt;09:00 ET&lt;/code&gt;, I want the card to say whether that is acceptable for operations and unsafe for finance. If two audiences need different timing and counting rules, I split the view before I stretch the promise.&lt;/p&gt;
&lt;p&gt;That is also where I keep this post separate from broader dashboard-trust topics. Once finance gets its own settled view and that view matters to leadership, I still want it to carry the explicit operating boundary from &lt;a href=&quot;/blog/posts/operating-spec-business-critical-dashboard/&quot;&gt;The operating spec I want before I trust a business-critical dashboard&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If the team still wants one shared headline number across both views, I treat that as a definition problem before I treat it as a training problem. That is when I pull up &lt;a href=&quot;/blog/posts/stop-metric-drift-across-dashboards/&quot;&gt;The definition card I use to stop metric drift across dashboards&lt;/a&gt;, because a self-service label cannot rescue a metric that still changes meaning by audience.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: teams use &lt;code&gt;self-service&lt;/code&gt; as access language instead of decision-safety language → Mitigation: tie the label to one named question and one named audience, not to the size of the permission group.&lt;/li&gt;
&lt;li&gt;Breaks when: one dashboard keeps absorbing booked, shipped, and settled logic under one title because separate views feel inconvenient → Mitigation: split the views or rename the tiles before widening the audience promise.&lt;/li&gt;
&lt;li&gt;Breaks when: the review card turns into ceremony for low-risk team dashboards that never travel beyond the original group → Mitigation: keep the artifact short and reserve the strict pass / fail gate for views likely to be reused casually.&lt;/li&gt;
&lt;li&gt;Breaks when: the card is written once and then ignored while filters, logic, or refresh expectations change → Mitigation: update the card in the same release path as dashboard logic or definition changes.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; For one dashboard that already travels outside its original audience, write the review card before the next forwarded link turns a useful team view into an unofficial company metric.&lt;/p&gt;
&lt;p&gt;The card earns its space when audience, question, and not-for boundaries are visible before people answer from the same chart in different meetings.&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>operations</category><category>dashboard-reliability</category></item><item><title>Where Microsoft Fabric fits and where I would keep it out of the critical path</title><link>https://berhanturkkaynagi.com/blog/posts/microsoft-fabric-critical-path-boundaries/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/microsoft-fabric-critical-path-boundaries/</guid><description>Where I use Microsoft Fabric in a finance and operations reporting path, and where I still keep semantic ownership, transform checks, and the first failure surface explicit before I trust the output.</description><pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Microsoft Fabric gets more useful to me when I stop asking whether it can do everything.&lt;/p&gt;
&lt;p&gt;The better question is narrower: which parts of a finance and operations reporting path should Fabric simplify, and which parts should stay explicit because trust breaks there first?&lt;/p&gt;
&lt;p&gt;That is where Fabric is strongest for a Microsoft-heavy team. OneLake, Lakehouse, Warehouse, shortcuts, and Power BI can cut real handoffs without forcing another stack debate.&lt;/p&gt;
&lt;p&gt;I still do not want that convenience to blur semantic ownership, transform checks, permission boundaries, or the moment a fast path becomes a slower one.&lt;/p&gt;
&lt;p&gt;If row meaning is still fuzzy, I start with &lt;a href=&quot;/blog/posts/every-important-model-needs-an-explicit-grain/&quot;&gt;Every important model needs an explicit grain&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If metric definition still drifts, I fix that next with &lt;a href=&quot;/blog/posts/stop-metric-drift-across-dashboards/&quot;&gt;The definition card I use to stop metric drift across dashboards&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Fabric can host those decisions. It does not make them for you.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;A Microsoft-heavy team has a reasonable instinct here.&lt;/p&gt;
&lt;p&gt;Power BI is already in place. Azure is already in place. Finance and operations want one path from raw ERP data to trusted reporting.&lt;/p&gt;
&lt;p&gt;The trouble starts when “one path” turns into one black box.&lt;/p&gt;
&lt;p&gt;Lakehouse, Warehouse, SQL analytics endpoint, shortcuts, security mode, and the semantic model can sit so close together that the path feels trustworthy just because the parts share a platform.&lt;/p&gt;
&lt;p&gt;That is not the same as knowing where the reporting definition lives, where a shortcut can fail, or when Direct Lake stops behaving the way the team assumed.&lt;/p&gt;
&lt;h2 id=&quot;where-fabric-earns-a-place&quot;&gt;Where Fabric earns a place&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Use Fabric where it removes real handoffs: OneLake for shared storage, Lakehouse for ingestion and transform work, Warehouse for refined SQL serving, and Power BI for the reporting surface.&lt;/li&gt;
&lt;li&gt;Pick the store by workload. Eventhouse fits high-volume event analysis. SQL database fits transactional work. Lakehouse plus Warehouse cover most reporting-path questions.&lt;/li&gt;
&lt;li&gt;Prefer Direct Lake on OneLake when OneLake security, broader modeling features, and in-memory behavior matter most. In a 2026-02-21 Microsoft-docs check, Microsoft documents that Direct Lake on OneLake does not use SQL endpoints or DirectQuery fallback, while Direct Lake on SQL uses the SQL analytics endpoint for discovery and permission checks and can fall back to DirectQuery for SQL views or SQL-based granular access control (&lt;a href=&quot;https://learn.microsoft.com/en-us/fabric/fundamentals/direct-lake-overview&quot;&gt;Direct Lake overview&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Use shortcuts only when the source owner, permission path, and first failure surface are explainable. Microsoft notes that the calling user must have permission on the shortcut target, and that Direct Lake over SQL or delegated T-SQL can pass the calling item’s owner identity instead of the user’s identity (&lt;a href=&quot;https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts&quot;&gt;OneLake shortcuts&lt;/a&gt;). Zero-copy is useful. Hidden dependency chains are not.&lt;/li&gt;
&lt;li&gt;Keep semantic ownership and model checks outside the platform promise. I still want named owners, trusted-table tests, and one release check before publish.&lt;/li&gt;
&lt;li&gt;That is the same discipline behind &lt;a href=&quot;/blog/posts/dbt-tests-business-critical-models/&quot;&gt;The dbt tests I write first for business-critical models&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Review region limits, security mode, and deployment mapping before a report is called trusted. In the same 2026-02-21 Microsoft-docs check, Microsoft says Direct Lake semantic models must be created in the same region as the data source workspace, and lakehouse deployment pipelines create a new empty lakehouse in the target workspace unless dependency mapping is configured (&lt;a href=&quot;https://learn.microsoft.com/en-us/fabric/fundamentals/direct-lake-overview&quot;&gt;Direct Lake overview&lt;/a&gt;, &lt;a href=&quot;https://learn.microsoft.com/en-us/fabric/data-engineering/lakehouse-git-deployment-pipelines&quot;&gt;lakehouse deployment pipelines&lt;/a&gt;). A unified platform still has seams.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;My fit test is simple: Fabric belongs where it removes handoffs; I pull it out of the critical path when storage mode, shortcut identity, or deployment behavior would otherwise stay implicit.&lt;/p&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;This is the checklist I want before I trust a Fabric-backed report that combines purchase orders, inventory exposure, and shipment status:&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;
















&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Field&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Critical report&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;weekly cash + inventory exposure&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Metric the CFO will challenge first&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;open purchase-order cash plus on-hand inventory value&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;


































&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Review step&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;What I confirm&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Status&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Land raw ERP and warehouse feeds in a Lakehouse.&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Fabric fit: pipelines, Spark, Delta tables, OneLake storage
Boundary: raw tables are replayable inputs, not finance-ready outputs&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;—&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Normalize the trusted reporting tables before they reach the semantic model.&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Fabric fit: stage and curate in Lakehouse and/or Warehouse
Explicit checks: grain uniqueness, null checks on cost and quantity, relationship checks to item and supplier dimensions&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;—&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Serve finance-facing tables from a Warehouse.&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Fabric fit: T-SQL, structured analytics, Power BI-friendly serving
Boundary: the same region and storage-mode rules above still apply&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;—&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Use a shortcut for supplier master only if the dependency is documented.&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Fabric fit: zero-copy access to a shared domain dataset
Boundary: if the target moves or permissions diverge, the failure appears upstream of the report&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;—&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Publish the semantic model with an intentional Direct Lake choice.&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Preferred path: Direct Lake on OneLake when OneLake security and in-memory behavior matter
Warning: Direct Lake on SQL is the deliberate path when SQL endpoint checks or DirectQuery fallback belong in the design&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;—&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;That is the boundary I care about.&lt;/p&gt;
&lt;p&gt;Fabric does useful work in this path. OneLake removes duplicate storage conversations. Lakehouse plus Warehouse narrow the handoff between engineering and reporting surfaces.&lt;/p&gt;
&lt;p&gt;I still do not want the critical path to depend on “we assume Fabric handles that.”&lt;/p&gt;
&lt;p&gt;If the team needs SQL views in the semantic-model path, or depends on SQL-based row security, I want that named because the Direct Lake behavior changes.&lt;/p&gt;
&lt;p&gt;If a shortcut points to another workspace, I want the source owner and permission path written down before the report is called trusted.&lt;/p&gt;
&lt;h2 id=&quot;what-i-keep-explicit&quot;&gt;What I keep explicit&lt;/h2&gt;
&lt;p&gt;This is the part I do not hand to the platform story.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Finance analytics owns metric definitions and semantic-model signoff.&lt;/li&gt;
&lt;li&gt;Data platform owns storage mode, region, security mode, and deployment mapping.&lt;/li&gt;
&lt;li&gt;Trusted tables still need grain, null, and relationship checks before they reach finance-facing measures.&lt;/li&gt;
&lt;li&gt;Shortcut targets, target permissions, and the first expected failure surface should be written down before the dataset joins a critical report.&lt;/li&gt;
&lt;li&gt;If deployment pipelines create a new empty lakehouse or remap dependencies, that validation should happen before a finance-facing promotion is called routine.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: the team assumes Fabric replaces semantic ownership, data contracts, and model tests because the stack is unified → Mitigation: keep platform fit and trust ownership as separate decisions, and name the owner of each.&lt;/li&gt;
&lt;li&gt;Breaks when: Direct Lake is treated like one stable mode regardless of views, SQL endpoint checks, or security choices → Mitigation: choose Direct Lake on OneLake or Direct Lake on SQL deliberately, and document where fallback can occur.&lt;/li&gt;
&lt;li&gt;Breaks when: shortcuts are sold as transparent zero-copy access with no downside → Mitigation: document the shortcut owner, target, permission path, and first failure surface before the dataset joins a critical report.&lt;/li&gt;
&lt;li&gt;Breaks when: deployment or region assumptions are treated as routine plumbing → Mitigation: review same-region limits, metadata-only deployment behavior, and target validation before promotion.&lt;/li&gt;
&lt;li&gt;Breaks when: the team forces the wrong Fabric store into the critical path because Fabric feels unified → Mitigation: keep Eventhouse for event workloads, SQL database for transactions, and Lakehouse plus Warehouse for most reporting.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; For one finance or operations report your team treats as critical, write which part Fabric can simplify, which part still needs an explicit owner, and which failure surface must stay visible.&lt;/p&gt;
&lt;p&gt;Can the team explain the Direct Lake choice, shortcut dependency, and warehouse boundary before the report becomes trusted?&lt;/p&gt;</content:encoded><category>data-platforms</category><category>data-reliability</category><category>operations</category></item><item><title>The Snowflake design choices that make downstream models easier to trust</title><link>https://berhanturkkaynagi.com/blog/posts/snowflake-design-choices-downstream-trust/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/snowflake-design-choices-downstream-trust/</guid><description>Snowflake models stay easier to trust when raw landing absorbs source drift, staged tables normalize names and types, and curated models keep row meaning, joins, and recovery boundaries explicit.</description><pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A Snowflake model gets hard to trust when one source change has no obvious place to stop.&lt;/p&gt;
&lt;p&gt;A feed sends &lt;code&gt;quantity&lt;/code&gt; as text, a timestamp arrives in a new format, a nested location field appears, and suddenly every downstream join has to decide what &lt;code&gt;orders&lt;/code&gt; means again.&lt;/p&gt;
&lt;p&gt;I do not use raw, stage, and curated layers as warehouse ceremony. I use them to isolate change: raw preserves source fidelity, stage normalizes names and types, and curated locks row meaning plus safe joins.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;A broken query is only the visible symptom.&lt;/p&gt;
&lt;p&gt;The deeper problem is a warehouse where nobody can say which layer owns source fidelity, which layer owns type cleanup, or which table a planner should actually trust.&lt;/p&gt;
&lt;p&gt;That blur turns one source change into extra joins, wider backfills, and longer incident notes.&lt;/p&gt;
&lt;p&gt;If raw, stage, and curated objects all carry half-cleaned business logic, every upstream drift leaks further than it should.&lt;/p&gt;
&lt;p&gt;The source contract still matters, so I still want &lt;a href=&quot;/blog/posts/minimum-data-contract-source-table/&quot;&gt;the source table expectations written down first&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I also want &lt;a href=&quot;/blog/posts/every-important-model-needs-an-explicit-grain/&quot;&gt;the curated model grain stated explicitly&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Both disciplines get easier to sustain when Snowflake itself has clear layer jobs.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Keep raw landing in an explicit raw schema and keep it close to source shape so replay, audit, and file-level evolution stay possible.&lt;/li&gt;
&lt;li&gt;Use staged tables to normalize names, cast strings into typed timestamps and numbers, and flatten the predictable parts of semi-structured payloads.&lt;/li&gt;
&lt;li&gt;Use curated models to freeze business grain, safe joins, and reader-facing measures so downstream users do not reason about source drift directly.&lt;/li&gt;
&lt;li&gt;Make the layer obvious from the fully qualified name. If &lt;code&gt;orders&lt;/code&gt; exists in three places, I want &lt;code&gt;analytics.raw.order_events&lt;/code&gt;, &lt;code&gt;analytics.stage.order_lines_typed&lt;/code&gt;, and &lt;code&gt;analytics.curated.fct_order_lines&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Declare keys and relationships as metadata on trusted tables when they help humans and BI tools understand the join path, while staying honest that Snowflake treats primary-key and foreign-key constraints on standard tables as informational metadata rather than enforced integrity (&lt;a href=&quot;https://docs.snowflake.com/en/user-guide/table-considerations&quot;&gt;table design guidance&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Treat transient storage, schema evolution, and clustering as bounded tools, not the default trust pattern.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;Imagine an order-and-inventory feed lands in Snowflake every hour.&lt;/p&gt;
&lt;p&gt;One morning the upstream export changes in three ways:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Source change&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- quantity: &quot;12&quot; instead of 12&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- event_timestamp: &quot;02/17/2026 05:14:08 -0400&quot; instead of ISO 8601&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- location.attributes.zone: new nested attribute&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If that change leaks straight into the trusted model, three downstream problems show up at once.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Quantity math becomes less safe.&lt;/li&gt;
&lt;li&gt;Timestamp filters and backfill windows get ambiguous.&lt;/li&gt;
&lt;li&gt;Location joins start depending on a nested payload shape instead of a stable typed column.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is the smallest boundary note I want instead:&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;




















&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Field&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;RAW&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;analytics.raw.order_events
keep landed payload in VARIANT
preserve source naming for replay and audit
allow schema evolution here only for controlled file loads, with `ENABLE_SCHEMA_EVOLUTION`, `MATCH_BY_COLUMN_NAME`, and the table boundary documented&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;STAGE&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;analytics.stage.order_lines_typed
expose order_id, order_line_id, sku_id, location_id
quantity NUMBER(38,0)
event_ts TIMESTAMP_TZ
location_zone VARCHAR&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;CURATED&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;analytics.curated.fct_order_lines
grain: one row per order_line_id
safe joins: dim_dates on order_date, dim_locations on location_id
lifecycle: permanent unless recovery boundaries are documented otherwise&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The staged model is where I want the ugly conversion work:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;select&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;  payload:order_id::&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;varchar&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; order_id,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;  payload:line_id::&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;varchar&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; order_line_id,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;  try_to_number(payload:quantity) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; quantity,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;  try_to_timestamp_tz(&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;    payload:event_timestamp::&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;varchar&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;    &apos;MM/DD/YYYY HH24:MI:SS TZHTZM&apos;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;  ) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; event_ts,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;  payload:&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;location&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;:id::&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;varchar&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; location_id,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;  payload:&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;location&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;:attributes:&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;zone&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;::&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;varchar&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; location_zone&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; analytics&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;raw&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.order_events;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That is the stage boundary doing its job.&lt;/p&gt;
&lt;p&gt;Raw stays faithful to the source. Stage holds the typing and flattening work. Curated facts do not need to reinterpret the payload every time the feed drifts.&lt;/p&gt;
&lt;p&gt;If &lt;code&gt;try_to_number(payload:quantity)&lt;/code&gt; starts returning &lt;code&gt;NULL&lt;/code&gt;, I want that failure to surface in stage, not inside a curated fact with string-shaped quantity logic.&lt;/p&gt;
&lt;p&gt;I want the same boundary for timestamps. If the feed is not ISO 8601, stage should parse it with an explicit format instead of relying on session settings.&lt;/p&gt;
&lt;p&gt;On the curated side, I still want row meaning frozen. &lt;code&gt;fct_order_lines&lt;/code&gt; stays one row per &lt;code&gt;order_line_id&lt;/code&gt;. &lt;code&gt;fct_inventory_snapshots&lt;/code&gt; stays at its own declared grain.&lt;/p&gt;
&lt;p&gt;A backfill or incident note should point to the staged normalization boundary, not force both facts to reinterpret raw payloads on the fly.&lt;/p&gt;
&lt;p&gt;I also declare the trusted join path there. On standard Snowflake tables, that metadata does not enforce integrity, but it still makes joins to dates, locations, and inventory snapshots easier to review.&lt;/p&gt;
&lt;p&gt;I keep the storage boundary explicit too.&lt;/p&gt;
&lt;p&gt;If the raw landing table is reconstructable from external files, I might accept transient storage there. I do not make the same default for curated facts or dimensions.&lt;/p&gt;
&lt;p&gt;Snowflake documents that transient tables have no Fail-safe, so I only use them where loss is acceptable or reconstruction is already documented (&lt;a href=&quot;https://docs.snowflake.com/en/user-guide/tables-temp-transient&quot;&gt;temporary and transient tables&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;I keep the same boundary on schema evolution. If I enable &lt;code&gt;ENABLE_SCHEMA_EVOLUTION&lt;/code&gt;, I want it on the raw file-load table that is supposed to absorb column additions. Snowflake limits automatic schema evolution to file loads and Snowpipe with &lt;code&gt;MATCH_BY_COLUMN_NAME&lt;/code&gt;; it can add columns and drop &lt;code&gt;NOT NULL&lt;/code&gt; when new files omit a field, which is exactly why I keep it out of curated facts (&lt;a href=&quot;https://docs.snowflake.com/en/user-guide/data-load-schema-evolution&quot;&gt;schema evolution docs&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;I keep clustering just as narrow.&lt;/p&gt;
&lt;p&gt;Snowflake already micro-partitions automatically, and its table design guidance says clustering is unnecessary for most tables and usually only worth revisiting when large tables spend real time scanning on a query path that differs from load order (&lt;a href=&quot;https://docs.snowflake.com/en/user-guide/table-considerations&quot;&gt;table design guidance&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Until then, clear layer ownership earns more trust than premature storage tuning.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: raw starts carrying business logic because the team is afraid to create another table → Mitigation: keep raw faithful to source shape and move typed cleanup plus reusable keys into stage.&lt;/li&gt;
&lt;li&gt;Breaks when: staged models leave predictable timestamps, numbers, or join attributes trapped inside &lt;code&gt;VARIANT&lt;/code&gt; → Mitigation: flatten and type the fields the team actually filters, joins, or backfills against.&lt;/li&gt;
&lt;li&gt;Breaks when: I pretend Snowflake constraints are enforcing integrity on standard tables → Mitigation: use primary and foreign keys as metadata for legibility and tooling, then keep the actual trust checks in model logic and tests.&lt;/li&gt;
&lt;li&gt;Breaks when: automatic schema evolution in file loads becomes permission to let curated models drift silently → Mitigation: let raw absorb evolving files, but require staged and curated changes to stay deliberate and reviewable.&lt;/li&gt;
&lt;li&gt;Breaks when: transient tables become the default for long-lived trusted models → Mitigation: reserve transient storage for scratch or reconstructable layers and keep trusted curated models on the safer lifecycle.&lt;/li&gt;
&lt;li&gt;Breaks when: I treat clustering as part of the base design pattern → Mitigation: keep it as a late optimization for large scan-heavy tables instead of mixing it into the minimum trust boundary.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; For one source change your team has already seen in Snowflake, write where the change should stop: raw replay, staged typing, curated grain, or recovery boundary.&lt;/p&gt;
&lt;p&gt;The boundary review is useful when it explains why raw, stage, and curated layers disagree before trust erodes downstream.&lt;/p&gt;</content:encoded><category>snowflake</category><category>data-platforms</category><category>data-modeling</category><category>data-reliability</category></item><item><title>What I watch in Snowflake before compute cost becomes a surprise</title><link>https://berhanturkkaynagi.com/blog/posts/snowflake-compute-cost-checks/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/snowflake-compute-cost-checks/</guid><description>I investigate Snowflake compute spikes in a fixed order: warehouse metering, idle gap, load, query history, later attribution, and pruning evidence before I start tuning.</description><pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Snowflake compute gets expensive fast when one warehouse bill jumps and nobody can say whether the extra credits came from idle minutes, warehouse pressure, or one recurring query family.&lt;/p&gt;
&lt;p&gt;That is when bad triage starts. Someone reaches for a bigger warehouse. Someone else opens one loud query profile. The basic question is still unsettled.&lt;/p&gt;
&lt;p&gt;I do not start with resizing or rewriting. I start with one warehouse and one question at a time.&lt;/p&gt;
&lt;p&gt;My first pass stays fixed: metering, idle gap, load, query history, lagged attribution, then pruning if scans still look wrong.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;The expensive mistake is not only a higher warehouse bill.&lt;/p&gt;
&lt;p&gt;It is asking the wrong question first.&lt;/p&gt;
&lt;p&gt;I see teams jump from “this warehouse cost more” to “rewrite that query” before anyone checks whether the warehouse was mostly busy, mostly idle, or stuck resuming.&lt;/p&gt;
&lt;p&gt;That is how one noisy morning turns into a week of unfocused tuning.&lt;/p&gt;
&lt;p&gt;Per-query attribution creates a second trap. As of 2026-02-10, Snowflake documents that &lt;code&gt;QUERY_ATTRIBUTION_HISTORY&lt;/code&gt; can lag by up to eight hours, excludes warehouse idle time, and omits very short queries (&lt;code&gt;&amp;#x3C;= ~100ms&lt;/code&gt;), so I treat it as a later confirmation layer instead of the first read (&lt;a href=&quot;https://docs.snowflake.com/en/sql-reference/account-usage/query_attribution_history&quot;&gt;Snowflake docs&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;If I wait for it before same-day triage, I lose the fast path. If I treat it like the whole bill, I blame one query family for cost it did not fully own.&lt;/p&gt;
&lt;p&gt;This is narrower than &lt;a href=&quot;/blog/posts/pipeline-observability-before-more-orchestration/&quot;&gt;Five pipeline observability signals before more orchestration&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here I am not asking whether the pipeline is healthy. I am asking why one warehouse got expensive and which layer earns the next ten minutes.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Start with &lt;code&gt;WAREHOUSE_METERING_HISTORY&lt;/code&gt;. Confirm which warehouse changed and whether the jump is new, recurring, or already normalizing.&lt;/li&gt;
&lt;li&gt;Compare &lt;code&gt;credits_used_compute&lt;/code&gt; with &lt;code&gt;credits_attributed_compute_queries&lt;/code&gt; before blaming one query family. The gap is where warehouse idle time starts to show up.&lt;/li&gt;
&lt;li&gt;Read &lt;code&gt;WAREHOUSE_LOAD_HISTORY&lt;/code&gt; next. I want to know whether the warehouse was busy, overloaded, queued during provisioning, or blocked.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;QUERY_HISTORY&lt;/code&gt; for the same-day drill-down. That is where I inspect queue time, spill, scan volume, cache use, and partitions scanned.&lt;/li&gt;
&lt;li&gt;Return to &lt;code&gt;QUERY_ATTRIBUTION_HISTORY&lt;/code&gt; later. I use it to confirm which query family actually consumed the compute credits once the lagged data lands.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;TABLE_QUERY_PRUNING_HISTORY&lt;/code&gt; only if the expensive pattern still points to unnecessary scans. That answers a different question from metering or attribution.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All five views live in Account Usage. As of 2026-02-10, Snowflake lists &lt;code&gt;WAREHOUSE_METERING_HISTORY&lt;/code&gt;, &lt;code&gt;WAREHOUSE_LOAD_HISTORY&lt;/code&gt;, and &lt;code&gt;TABLE_QUERY_PRUNING_HISTORY&lt;/code&gt; under &lt;code&gt;USAGE_VIEWER&lt;/code&gt;, &lt;code&gt;QUERY_HISTORY&lt;/code&gt; under &lt;code&gt;GOVERNANCE_VIEWER&lt;/code&gt;, and &lt;code&gt;QUERY_ATTRIBUTION_HISTORY&lt;/code&gt; under either role in &lt;a href=&quot;https://docs.snowflake.com/en/sql-reference/account-usage&quot;&gt;Account Usage&lt;/a&gt;. The order still matters more than memorizing every column.&lt;/p&gt;
&lt;p&gt;If the question shifts from “why did this warehouse get expensive?” to “what cloud services actually billed?”, I step out to &lt;code&gt;METERING_DAILY_HISTORY&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The warehouse metering views tell me consumed credits first. That is the right triage start, but it is not the whole billed-cloud-services answer.&lt;/p&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;Imagine a daily transform warehouse named &lt;code&gt;TRANSFORM_DAILY_WH&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;A model fan-out ships in the same morning release. The warehouse that usually burns a steady amount of compute between &lt;code&gt;05:45 ET&lt;/code&gt; and &lt;code&gt;07:00 ET&lt;/code&gt; suddenly costs more than twice its normal run.&lt;/p&gt;
&lt;p&gt;I do not want a dashboard first.&lt;/p&gt;
&lt;p&gt;I want one investigation note I can scan in order:&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;








































&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Field&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Warehouse&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;TRANSFORM_DAILY_WH&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Window&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;2026-02-10 05:45-07:00 ET&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;1. WAREHOUSE_METERING_HISTORY&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;credits_used_compute: 18.4
credits_attributed_compute_queries: 10.9
idle compute gap: 7.5&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;2. WAREHOUSE_LOAD_HISTORY&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;avg_running elevated during 06:10-06:40
avg_queued_load near zero
avg_queued_provisioning spikes at warehouse resume
avg_blocked negligible&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;3. QUERY_HISTORY&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;one query_parameterized_hash dominates bytes_scanned
queued_overload_time low
bytes_spilled_to_remote_storage high
percentage_scanned_from_cache low
partitions_scanned jumped versus the prior run&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;4. QUERY_ATTRIBUTION_HISTORY (later check)&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;same query family accounts for 62% of attributed compute
attribution confirms the suspect pattern, not the full warehouse bill&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;5. TABLE_QUERY_PRUNING_HISTORY&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;affected fact table pruning_ratio: 0.18
partitions_scanned_per_query far above the recent baseline&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Decision&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;do not resize first
fix the scan-heavy query pattern
shorten idle time on this task warehouse instead of paying for empty minutes&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;That note tells me where the next ten minutes should go.&lt;/p&gt;
&lt;p&gt;The metering lines tell me the warehouse did use more compute, but not all of it was query-attributed. That is my cue to keep both workload and idle time in view.&lt;/p&gt;
&lt;p&gt;The load lines tell me the warehouse was not overloaded for the morning. &lt;code&gt;avg_queued_load&lt;/code&gt; stays low. &lt;code&gt;avg_queued_provisioning&lt;/code&gt; spikes around resume, but it does not explain the whole bill.&lt;/p&gt;
&lt;p&gt;That keeps me from resizing first.&lt;/p&gt;
&lt;p&gt;Then I use &lt;code&gt;QUERY_HISTORY&lt;/code&gt; as the fast path. I group the repeat pattern with &lt;code&gt;query_parameterized_hash&lt;/code&gt;, then inspect queue time, spill, cache use, and partitions scanned.&lt;/p&gt;
&lt;p&gt;In this case one query family is scanning more, spilling more, and using less cache than the earlier baseline.&lt;/p&gt;
&lt;p&gt;Later, once &lt;code&gt;QUERY_ATTRIBUTION_HISTORY&lt;/code&gt; catches up, I can confirm that the same family consumed most of the attributed compute credits.&lt;/p&gt;
&lt;p&gt;That confirmation matters, but it is the later layer. It still does not explain the warehouse idle gap.&lt;/p&gt;
&lt;p&gt;Finally, &lt;code&gt;TABLE_QUERY_PRUNING_HISTORY&lt;/code&gt; gives me the scan-efficiency answer. A low pruning ratio and high partitions scanned per query tell me this is not just a big query. It is a wasteful scan pattern.&lt;/p&gt;
&lt;p&gt;My default decision here is boring on purpose.&lt;/p&gt;
&lt;p&gt;I would not resize first because queue overload stayed low.&lt;/p&gt;
&lt;p&gt;I would tighten the scan-heavy pattern, keep the warehouse idle gap visible, and check whether the task warehouse is sitting around longer than the workload justifies.&lt;/p&gt;
&lt;p&gt;That boundary matters. As of 2026-02-10, Snowflake says suspending a warehouse drops its cache, recommends immediate suspension for tasks, and suggests at least 10 minutes for BI or SELECT-heavy warehouses that benefit from cache warmth (&lt;a href=&quot;https://docs.snowflake.com/en/user-guide/performance-query-warehouse-cache&quot;&gt;warehouse cache guidance&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;In this example, the workload is a task warehouse. I care more about stopping empty minutes than preserving cache warmth between runs.&lt;/p&gt;
&lt;p&gt;If the spike followed a dbt release, I also want the release path to stay legible.&lt;/p&gt;
&lt;p&gt;That is why I keep the deployment record explicit in &lt;a href=&quot;/blog/posts/dbt-core-deployments-boring-production/&quot;&gt;How I keep dbt Core deployments boring in production&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: I start with &lt;code&gt;QUERY_ATTRIBUTION_HISTORY&lt;/code&gt; and treat it as the whole bill → Mitigation: keep warehouse metering and the idle gap ahead of per-query attribution.&lt;/li&gt;
&lt;li&gt;Breaks when: I rewrite SQL before checking whether the warehouse was overloaded, provisioning, or blocked → Mitigation: read &lt;code&gt;WAREHOUSE_LOAD_HISTORY&lt;/code&gt; before choosing sizing, scheduling, or query fixes.&lt;/li&gt;
&lt;li&gt;Breaks when: I use &lt;code&gt;QUERY_HISTORY&lt;/code&gt; as if it already contains attributed compute cost → Mitigation: use it for immediate query symptoms, then come back later for lagged credit attribution.&lt;/li&gt;
&lt;li&gt;Breaks when: I copy BI-oriented cache advice onto a task warehouse that should suspend quickly → Mitigation: match auto-suspend choices to the actual workload instead of one universal rule.&lt;/li&gt;
&lt;li&gt;Breaks when: I turn one warehouse investigation into a full Snowflake billing explainer → Mitigation: keep the post centered on one warehouse-spike path and use &lt;code&gt;METERING_DAILY_HISTORY&lt;/code&gt; only for the billed-cost boundary.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; For one Snowflake warehouse that feels expensive right now, write a six-line note: metering change, idle-cost split, queue state, top query pattern, workload class, and first fix.&lt;/p&gt;
&lt;p&gt;That note makes the first move less like generic tuning and more like a choice between sizing, scheduling, or workload separation.&lt;/p&gt;</content:encoded><category>snowflake</category><category>data-platforms</category><category>operations</category></item><item><title>The dbt tests I write first for business-critical models</title><link>https://berhanturkkaynagi.com/blog/posts/dbt-tests-business-critical-models/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/dbt-tests-business-critical-models/</guid><description>For a business-critical dbt model, I order the first tests by failure cost: grain uniqueness, decision-critical nulls, relationships, domain rules, and business assertions.</description><pubDate>Thu, 16 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I start here only after two earlier questions are settled: the &lt;a href=&quot;/blog/posts/minimum-data-contract-source-table/&quot;&gt;source contract&lt;/a&gt; exists, and the source freshness check is green.&lt;/p&gt;
&lt;p&gt;Then the question narrows: which model-level dbt tests would change my first response if the model started lying?&lt;/p&gt;
&lt;p&gt;Freshness is a separate source check, not one of dbt’s four built-in generic data tests.&lt;/p&gt;
&lt;p&gt;Once that gate is green, I want the smallest model-level test set that blocks the failures most likely to break joins, counts, statuses, or planner-facing quantities.&lt;/p&gt;
&lt;p&gt;On a business-critical model, the first ladder should catch duplicate rows, broken parent joins, invalid states, and one model-specific rule before the dashboard conversation starts.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Imagine &lt;code&gt;fct_purchase_order_lines&lt;/code&gt; feeds an operations dashboard that planners use to chase late supplier deliveries. The source lands on time. The model still builds. The dashboard still renders.&lt;/p&gt;
&lt;p&gt;A source-system fix quietly changes three things at once.&lt;/p&gt;
&lt;p&gt;A retry path duplicates some &lt;code&gt;purchase_order_line_id&lt;/code&gt; values, some rows keep a &lt;code&gt;purchase_order_id&lt;/code&gt; missing from the header model, and one status mapping starts writing &lt;code&gt;reopened&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;None of that requires a broken DAG to create a business problem.&lt;/p&gt;
&lt;p&gt;The failure mode I see is treating tests like a generic checklist instead of ordering them around the next decision.&lt;/p&gt;
&lt;p&gt;On a business-critical model, the first tests should tell me quickly whether counts, joins, and decision states are still safe.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Start after the source freshness check is green.&lt;/li&gt;
&lt;li&gt;Add the uniqueness check that enforces the declared grain first. If the grain is composite, expose a stable surrogate key and test that, or write a singular data test against the full key.&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;not_null&lt;/code&gt; only on fields that would break a real decision path if they disappeared, such as the parent key, quantity, effective date, or business status.&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;relationships&lt;/code&gt; where an orphaned record would create a business-facing mismatch between the model and the parent entity.&lt;/li&gt;
&lt;li&gt;Add one &lt;code&gt;accepted_values&lt;/code&gt; check or one custom singular data test for the business-state or range rule most likely to drift without breaking the SQL.&lt;/li&gt;
&lt;li&gt;Add one custom business-rule test for the highest-risk scenario the built-ins still miss, then stop before the suite turns into noise.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;This is the compact test ladder I would want on a planner-facing purchase-order-line model after the source freshness gate is already green:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;yaml&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;models&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;  - &lt;/span&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;name&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;fct_purchase_order_lines&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;    columns&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;      - &lt;/span&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;name&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;po_line_grain_key&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;        data_tests&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;          - &lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;unique&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;          - &lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;not_null&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;      - &lt;/span&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;name&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;purchase_order_id&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;        data_tests&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;          - &lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;not_null&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;          - &lt;/span&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;relationships&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;              arguments&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;                to&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;ref(&apos;fct_purchase_orders&apos;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;                field&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;purchase_order_id&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;      - &lt;/span&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;name&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;line_status&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;        data_tests&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;          - &lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;not_null&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;          - &lt;/span&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;accepted_values&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;              arguments&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;                values&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;: [&lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;&apos;open&apos;&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;&apos;partial&apos;&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;&apos;closed&apos;&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;&apos;cancelled&apos;&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;      - &lt;/span&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;name&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;open_quantity&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#22863A;--shiki-dark:#85E89D&quot;&gt;        data_tests&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;          - &lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;not_null&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That order is deliberate.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;unique&lt;/code&gt; on &lt;code&gt;po_line_grain_key&lt;/code&gt; goes first because one duplicated line can inflate open quantity, duplicate joins, and make planners think more material is still outstanding than it is.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;not_null&lt;/code&gt; on &lt;code&gt;purchase_order_id&lt;/code&gt;, &lt;code&gt;line_status&lt;/code&gt;, and &lt;code&gt;open_quantity&lt;/code&gt; comes next because those fields decide whether the row can be joined, interpreted, or acted on.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;relationships&lt;/code&gt; on &lt;code&gt;purchase_order_id&lt;/code&gt; earns a slot because orphaned lines create a mismatch between the line model and the header view the business also reads.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;relationships&lt;/code&gt; excludes &lt;code&gt;NULL&lt;/code&gt; values by design, so I only trust it after I have decided whether nulls should fail separately.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;accepted_values&lt;/code&gt; on &lt;code&gt;line_status&lt;/code&gt; comes before a softer shape check because one invalid state can drive the wrong operational response even when the row count still looks normal.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then I add one model-specific rule the built-ins will not catch:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#6A737D;--shiki-dark:#6A737D&quot;&gt;-- tests/open_quantity_never_negative_for_active_lines.sql&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;select&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; *&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; {{ ref(&lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt;&apos;fct_purchase_order_lines&apos;&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) }}&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;where&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; line_status &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;!=&lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt; &apos;cancelled&apos;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;  and&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; open_quantity &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;&amp;#x3C;&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; 0&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I keep this first set small on purpose.&lt;/p&gt;
&lt;p&gt;If &lt;code&gt;unique&lt;/code&gt; fails, I inspect retries, merge logic, or a bad intermediate join. If &lt;code&gt;relationships&lt;/code&gt; fails, I inspect parent load timing or the ref boundary. If &lt;code&gt;accepted_values&lt;/code&gt; fails, I inspect the latest status-mapping change.&lt;/p&gt;
&lt;p&gt;Each early test should narrow the first investigation step.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when the model has no declared grain or stable key yet → Mitigation: go back to &lt;a href=&quot;/blog/posts/every-important-model-needs-an-explicit-grain/&quot;&gt;the explicit grain note&lt;/a&gt; first, then let the first &lt;code&gt;unique&lt;/code&gt; test enforce that row meaning.&lt;/li&gt;
&lt;li&gt;Breaks when the real grain is composite and the suite checks one convenient column → Mitigation: expose a stable surrogate key or add a model-level assertion that matches the declared grain.&lt;/li&gt;
&lt;li&gt;Breaks when teams copy the same null and relationships tests onto every field → Mitigation: keep only the tests that would change the first response on this model.&lt;/li&gt;
&lt;li&gt;Breaks when &lt;code&gt;relationships&lt;/code&gt; sits on optional or noisy foreign keys and creates alert churn → Mitigation: reserve it for joins where orphaned records create a real business mismatch, and pair it with &lt;code&gt;not_null&lt;/code&gt; only when nulls should fail.&lt;/li&gt;
&lt;li&gt;Breaks when the built-ins all pass but the model still violates a business rule → Mitigation: add one custom test for the highest-risk scenario, such as negative open quantity or an impossible state transition.&lt;/li&gt;
&lt;li&gt;Breaks when the suite grows into dozens of low-value checks because the model is important → Mitigation: rank tests by failure cost and response path, then add depth only where the business risk justifies it.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Pick one business-critical model, confirm the source freshness gate is green, and write the first four or five dbt tests that would change your first response if the model started lying tomorrow morning.&lt;/p&gt;
&lt;p&gt;I’d compare notes on the business-critical model where the test suite keeps growing but the first investigation step still isn’t clear.&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>data-quality</category><category>data-reliability</category></item><item><title>The operating spec I want before I trust a business-critical dashboard</title><link>https://berhanturkkaynagi.com/blog/posts/operating-spec-business-critical-dashboard/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/operating-spec-business-critical-dashboard/</guid><description>I trust a business-critical dashboard more when one short operating spec names the owner, source models, metric note, refresh cutoff, trusted validation slice, known boundaries, and failure path.</description><pubDate>Fri, 13 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A dashboard can appear in every executive review and still be hard to trust when it matters.&lt;/p&gt;
&lt;p&gt;If nobody can answer who owns it, when the number becomes safe to use, which slice to validate first, and what the fallback is when the refresh lands late, I do not treat it as production-ready.&lt;/p&gt;
&lt;p&gt;Before I trust a business-critical dashboard, I want one short operating spec beside it. I am not trying to add more documentation. I am trying to make the next decision safer when the number is late, disputed, or clearly wrong.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;A business-critical dashboard earns trust one review at a time, but it can lose that trust in one morning.&lt;/p&gt;
&lt;p&gt;I have seen teams hand off a dashboard with useful charts and no operating boundary anyone can point to. The finance lead assumes the number is safe by &lt;code&gt;07:30 ET&lt;/code&gt;. Analytics assumes everyone knows draft orders are excluded. Nobody can point to the owner, the trusted validation slice, or the fallback plan when the refresh misses its cutoff.&lt;/p&gt;
&lt;p&gt;That is how a dashboard can look finished while still behaving like an undocumented handoff. When the headline KPI is questioned, the room starts with interpretation instead of the first check.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Name the dashboard owner and the business owner separately. I want one person accountable for the data path and one person accountable for definition decisions.&lt;/li&gt;
&lt;li&gt;Write the primary decision the dashboard supports. If I cannot say what meeting or review it is for, the rest of the spec gets vague fast.&lt;/li&gt;
&lt;li&gt;List the source models that feed the headline KPI, not every upstream table in the warehouse.&lt;/li&gt;
&lt;li&gt;Add one short metric note with the key exclusions, boundary conditions, and definition-change owner.&lt;/li&gt;
&lt;li&gt;Write the refresh expectation as a real operating window: source landing time, dashboard refresh time, and the point where I call the dashboard stale.&lt;/li&gt;
&lt;li&gt;Record one trusted validation slice I can check quickly when the number is challenged.&lt;/li&gt;
&lt;li&gt;Write the failure path: first check, second check, safe fallback, and who posts the update.&lt;/li&gt;
&lt;li&gt;Keep the spec beside the dashboard or release path, and update it when the dashboard changes. If it lives in a stale wiki page, it is already failing.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;example-the-one-page-operating-spec-i-want-beside-the-dashboard&quot;&gt;Example: the one-page operating spec I want beside the dashboard&lt;/h2&gt;
&lt;p&gt;Here is the kind of spec I want attached to an executive KPI scorecard before the weekly review depends on it:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Dashboard operating spec&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Dashboard: executive KPI scorecard&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Dashboard owner: analytics engineering&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Business owner: finance director&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Primary decision: weekly executive review of revenue, margin, and order health&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Source models&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- mart_finance_daily_kpis&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- fct_orders&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- dim_customers&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Headline metric note&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Settled net revenue&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Excludes sandbox, QA, and fully refunded orders&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Finance is the decision owner for definition changes&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Refresh expectation&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Source landing complete by 06:30 ET&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Dashboard refresh complete by 07:15 ET&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Treat the dashboard as stale after 07:30 ET unless the owner posts an update&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Trusted validation slice&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- US enterprise orders&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Last complete business day&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Compare to the settled finance snapshot when the headline KPI is questioned&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Known boundaries&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Intra-day order activity is incomplete before the morning cutoff&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Draft orders are intentionally excluded&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Margin is directional until freight adjustments settle&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Failure path&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- First check: source freshness and latest successful publish&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Second check: validation slice against the settled snapshot&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Safe fallback: use yesterday&apos;s settled KPI snapshot in the review until the issue is resolved&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- Communication path: owner posts a short incident note with next update time&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Each line in that spec changes the next decision.&lt;/p&gt;
&lt;p&gt;I do not need a big wiki page. I need enough operating context to answer three questions fast: is the dashboard safe to use, who makes the next call, and what do we do if it is not?&lt;/p&gt;
&lt;p&gt;The owners tell me who approves definition changes and who runs the first technical check. The source model list narrows the search space before anyone starts opening warehouse tables one at a time.&lt;/p&gt;
&lt;p&gt;The metric note keeps the number aligned with how the business already uses it. The trusted validation slice gives me one fast comparison when the KPI is questioned. The refresh window tells me when the dashboard is safe, not just when a scheduler says something ran. The fallback line tells me whether we use yesterday’s settled snapshot or hold the review until the number is safe.&lt;/p&gt;
&lt;p&gt;The release checklist governs a change. The operating spec governs the dashboard on an ordinary Tuesday, when nothing is supposed to be changing.&lt;/p&gt;
&lt;p&gt;When the logic, filters, or trusted slice change, I update this spec in the same release path as &lt;a href=&quot;/blog/posts/bi-dashboard-release-checks/&quot;&gt;A dashboard release checklist before a BI change goes live&lt;/a&gt;. If the number is still wrong after publish, I move into &lt;a href=&quot;/blog/posts/dashboard-number-changes/&quot;&gt;When a dashboard number changes, I check these four things first&lt;/a&gt; instead of debating the chart live.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: the spec turns into a long wiki page nobody updates → Mitigation: keep it to one page, store it beside the dashboard or release artifact, and update it in the same change that affects the dashboard.&lt;/li&gt;
&lt;li&gt;Breaks when: the dashboard spec pretends an unresolved metric definition is settled → Mitigation: mark the disputed field clearly and hold the dashboard out of the critical review path until the definition owner decides.&lt;/li&gt;
&lt;li&gt;Breaks when: teams force the full operating spec onto low-risk dashboards and create more process than value → Mitigation: reserve the full version for executive and business-critical dashboards, and use a lighter note for lower-risk BI work.&lt;/li&gt;
&lt;li&gt;Breaks when: the fallback path is vague because nobody wants to say “do not use this number” before a live review → Mitigation: agree on the safe fallback and escalation path before the next meeting depends on the dashboard.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; write the owner, source models, metric note, refresh cutoff, trusted slice, and failure path for one business-critical dashboard before the next review depends on it.&lt;/p&gt;
&lt;p&gt;If an executive dashboard on your team still cannot answer who owns it, when the number becomes safe to use, and what the fallback is when the refresh lands late, that missing spec is usually a shorter fix than another layer of review process.&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>operations</category><category>dashboard-reliability</category></item><item><title>How I keep dbt Core deployments boring in production</title><link>https://berhanturkkaynagi.com/blog/posts/dbt-core-deployments-boring-production/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/dbt-core-deployments-boring-production/</guid><description>I keep dbt Core deployments boring with one short deployment record: state reference, selector, target, changed nodes, and promotion result before production moves.</description><pubDate>Tue, 10 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A dbt deploy is easiest to trust when another engineer can tell me exactly what will run before production moves.&lt;/p&gt;
&lt;p&gt;For me, this is the upstream counterpart to &lt;a href=&quot;/blog/posts/bi-dashboard-release-checks/&quot;&gt;my BI release checklist&lt;/a&gt;: one compact artifact that makes scope, approval, and rollback legible before a business-critical number changes.&lt;/p&gt;
&lt;p&gt;The failure mode I care about is not only a red job. It is a green promotion nobody can explain after a business-critical model changes.&lt;/p&gt;
&lt;p&gt;That is why I keep the state reference, selector, target, changed nodes, and promotion result visible in one deployment note. If those details disappear behind wrappers or tribal memory, the deploy can still go green and still be hard to trust.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;A dbt Core deploy can look fine in a pull request and still be hard to trust on its way to production.&lt;/p&gt;
&lt;p&gt;SQL review passes. Tests are green. The pipeline UI says a job ran. I still need to know which production manifest I compared against, which selector will run, which target proved the change, and whether production promotion reuses that same logic or disappears behind a different wrapper.&lt;/p&gt;
&lt;p&gt;If those answers are fuzzy, the failure mode is not just a red job. It is a production promotion nobody can explain when a business-critical model changes under time pressure.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Keep one production manifest or state artifact available to CI so changed-node selection is based on a known production reference, not a guess.&lt;/li&gt;
&lt;li&gt;Use one explicit selector for the change set and its downstream blast radius, then carry that selector into the deployment summary.&lt;/li&gt;
&lt;li&gt;Keep environment targets intentional and easy to inspect so CI, staging, and production do not quietly diverge.&lt;/li&gt;
&lt;li&gt;Prove the changed selection in a non-production target first, then promote with the same selector and state logic instead of inventing a second deployment path.&lt;/li&gt;
&lt;li&gt;Attach one short deployment summary that shows the state reference, selector, target, changed nodes, and approval or promotion result.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If another engineer cannot explain what will run before the deploy starts, the workflow is still too opaque. I care more about an inspectable deploy than one more clever wrapper.&lt;/p&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;Imagine a pull request that changes &lt;code&gt;fct_revenue&lt;/code&gt; and one downstream finance mart.&lt;/p&gt;
&lt;p&gt;The CI step I want visible is short:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;dbt build --select state:modified+ --state &amp;#x3C;prod-artifacts&gt; --target ci&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The command is short, but the deployment summary is what earns trust:&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;
















































&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Field&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;PR&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;#284&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Changed model&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;fct_revenue&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Downstream impact&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;mart_finance_revenue&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;State reference&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;manifest.json from the last successful production deploy&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Selector&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;state:modified+&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Target&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;ci&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Changed nodes&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;fct_revenue
mart_finance_revenue&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Approval&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;changed-node build reviewed after ci run passed&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Production promotion&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;reused the same selector and state logic&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Result&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;promoted without widening scope in production&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;That gives me enough context to approve production promotion.&lt;/p&gt;
&lt;p&gt;I can see the production reference, selector, target, changed nodes, and approval path in one place. Most important, production promotion reused the same selection logic instead of inventing a second path at the promotion boundary.&lt;/p&gt;
&lt;p&gt;If the same pull request had a green CI badge but no visible state reference, no selector, and no clear note about production promotion, I would still treat it as fragile. A green run is not the same as an explainable deploy.&lt;/p&gt;
&lt;p&gt;If the deployment note is explicit and a run still goes wrong, I move to &lt;a href=&quot;/blog/posts/pipeline-observability-before-more-orchestration/&quot;&gt;the observability signals that name the failed boundary&lt;/a&gt; to see whether the missed cutoff is recoverable before I open &lt;a href=&quot;/blog/posts/analytics-incident-note-template/&quot;&gt;The incident note template I wish every analytics team used&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: the production manifest or state artifact is missing, stale, or hard to retrieve → Mitigation: publish the last-known production artifacts from CI and make the state reference part of the default deployment record.&lt;/li&gt;
&lt;li&gt;Breaks when: dev, CI, and production targets drift in ways the deployment note does not surface → Mitigation: keep target configuration explicit, review the target assumptions in the same place as the dbt invocation, and avoid hidden environment-specific behavior.&lt;/li&gt;
&lt;li&gt;Breaks when: &lt;code&gt;state:modified+&lt;/code&gt; hides a wider blast radius on a business-critical model → Mitigation: widen the selector deliberately for critical paths and make the extra scope an explicit deployment choice, not a surprise after promotion.&lt;/li&gt;
&lt;li&gt;Breaks when: CI wrappers make dbt feel automated but nobody can tell which command, selector, or artifact actually ran → Mitigation: expose the exact dbt invocation and artifact references in the deployment summary so the run stays inspectable.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; For one business-critical dbt model, write the deployment summary you want to see before its next production promotion: state reference, selector, target, changed nodes, and approval.&lt;/p&gt;
&lt;p&gt;The promotion is easier to trust when the green run can be tied back to the manifest, selector, target, changed nodes, and approval without archaeology.&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>operations</category><category>data-reliability</category></item><item><title>A dashboard release checklist before a BI change goes live</title><link>https://berhanturkkaynagi.com/blog/posts/bi-dashboard-release-checks/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/bi-dashboard-release-checks/</guid><description>I use a short BI release checklist to compare outputs, review filter and definition changes, confirm sign-off, and keep a rollback path before business-critical dashboard changes go live.</description><pubDate>Sun, 08 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A business-critical dashboard change is not ready just because the pull request looks tidy.&lt;/p&gt;
&lt;p&gt;Before I approve a BI change, I want one small release artifact that says which slice was checked, what number should move, who signed off, and how I roll it back if the live result lands wrong.&lt;/p&gt;
&lt;p&gt;It plays the same role for BI that &lt;a href=&quot;/blog/posts/dbt-core-deployments-boring-production/&quot;&gt;the deployment summary I want for dbt Core promotions&lt;/a&gt; plays upstream: one compact artifact that makes scope and approval visible before promotion, so the leadership review is not the first serious review of the release.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;A dashboard change can look small in code review and still change the number leadership sees in production.&lt;/p&gt;
&lt;p&gt;One filter tweak, one rewritten chart calculation, or one quiet definition note can change the story leaders see at &lt;code&gt;08:00 ET&lt;/code&gt;. If nobody compares outputs on a trusted slice, records the expected delta, and names the rollback path, the team ends up explaining the number during the review instead of shipping it with confidence.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Freeze the release scope first: which tiles, metrics, filters, and date windows are actually changing.&lt;/li&gt;
&lt;li&gt;Compare before-and-after outputs on one validation slice the business owner already trusts, using the same date window and filters on both sides.&lt;/li&gt;
&lt;li&gt;Read the filter logic and metric-definition changes line by line in the pull request or release note, not just in chart screenshots.&lt;/li&gt;
&lt;li&gt;Write down the expected visible difference before the dashboard is republished.&lt;/li&gt;
&lt;li&gt;Get explicit owner sign-off on the changed slice and the definition note.&lt;/li&gt;
&lt;li&gt;Keep one rollback path ready before I call the release safe, including the revert step and the prior comparison snapshot.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is release evidence, not polish review. A dashboard can look cleaner and still be less trustworthy if the release note never says which filter changed or why the number moved.&lt;/p&gt;
&lt;h2 id=&quot;example-the-dashboard-release-checklist-in-the-pull-request&quot;&gt;Example: the dashboard release checklist in the pull request&lt;/h2&gt;
&lt;p&gt;Here is the kind of release artifact I want attached to a revenue dashboard pull request before it goes live:&lt;/p&gt;
&lt;p&gt;Dashboard release checklist&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;
































&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Field&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Dashboard&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;executive revenue scorecard&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;PR&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;#418&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Change owner&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;analytics engineering&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Business owner&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;finance analytics lead&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Validation slice&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;2025-11-24 to 2025-12-02, US enterprise orders only&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Change summary&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Exclude sandbox and QA orders from the revenue filter
Update the revenue mix chart to use settled net revenue
Update the seven-day trend chart denominator to match the settled view&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;


































&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Review step&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;What I confirm&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Status&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Before/after output comparison&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Headline revenue: $4,218,330 -&gt; $4,201,980 (-0.39%), expected from sandbox-order removal
Revenue mix by segment: enterprise share changes from 61.2% -&gt; 61.5%, expected
Seven-day trend: two days move because the denominator now matches settled revenue&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;pass&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Filter logic review&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;New exclusion: order_source in (&apos;sandbox&apos;, &apos;qa&apos;)
No change to refunded-order handling
No date-window change outside the stated validation slice&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;pass&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Definition note&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Dashboard note added: revenue mix and seven-day trend now use settled net revenue instead of booked gross revenue
Release note linked in the PR summary for downstream readers&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;pass&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Owner sign-off&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Finance analytics lead reviewed the changed slice at 16:10 ET
Expected visible differences match the release note&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;approved&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Rollback path&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Revert PR #418 and republish the previous dashboard definition
Keep the prior validation-slice snapshot in the release note for comparison&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;ready&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;That checklist earns its space because each line changes the next decision.&lt;/p&gt;
&lt;p&gt;The before-and-after slice tells me whether the delta is understood. The filter review catches quiet scope changes. The definition note keeps the dashboard aligned with how the business talks about the metric. Owner sign-off makes the change explicit. The rollback path keeps the release safe if the live result still surprises us.&lt;/p&gt;
&lt;p&gt;If the release still lands wrong after that, I move into incident mode and keep &lt;a href=&quot;/blog/posts/analytics-incident-note-template/&quot;&gt;The incident note template I wish every analytics team used&lt;/a&gt; open beside the debugging work. The point of the checklist is to make that handoff rare.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: teams force the full checklist onto low-risk dashboards and make routine edits slower than they need to be → Mitigation: keep the strict version for executive and business-critical dashboards, and use a lighter release note for low-risk BI work.&lt;/li&gt;
&lt;li&gt;Breaks when: there is no stable validation slice or baseline report to compare against → Mitigation: create one trusted slice before the release review starts, even if the first version is manual and narrow.&lt;/li&gt;
&lt;li&gt;Breaks when: the filter or metric definition is still disputed when the release review starts → Mitigation: hold the go-live and resolve the definition in writing before anyone debates chart polish.&lt;/li&gt;
&lt;li&gt;Breaks when: an urgent hotfix needs to land during an active incident → Mitigation: collapse the checklist to the trust boundary, owner approval, and rollback steps first, then backfill the release note after the number is safe again.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; For one business-critical dashboard, write the release checklist you want attached to its next logic or filter change before it reaches leadership.&lt;/p&gt;
&lt;p&gt;The release conversation feels calmer when the expected delta and rollback path are visible before anyone is defending a surprising card.&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>operations</category><category>dashboard-reliability</category></item><item><title>The incident note template I wish every analytics team used</title><link>https://berhanturkkaynagi.com/blog/posts/analytics-incident-note-template/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/analytics-incident-note-template/</guid><description>I keep a one-page incident note during analytics incidents so impact, checks, decisions, cause, and prevention stay clear while the KPI is still broken.</description><pubDate>Sat, 07 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;At &lt;code&gt;08:12 ET&lt;/code&gt;, I want one page open before I want another Slack thread.&lt;/p&gt;
&lt;p&gt;Slack fills up, someone asks whether the leadership review is still safe, and half the useful evidence stays trapped in query tabs or terminal history.&lt;/p&gt;
&lt;p&gt;At the first real check, I open a one-page incident note and start writing timestamps, evidence, and the current decision. I am not trying to document everything. I am trying to keep the incident legible while the KPI is still moving.&lt;/p&gt;
&lt;p&gt;The goal is one visible source for the timeline, evidence, and current decision before the meeting clock takes over.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Imagine an executive revenue KPI is &lt;code&gt;+11.8%&lt;/code&gt; versus the settled baseline at &lt;code&gt;08:05 ET&lt;/code&gt;. The leadership review starts at &lt;code&gt;08:30 ET&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;At that point, I need more than the fix. I need one page that says whether the dashboard is safe, which checks already passed, what I think is broken, and when the next update goes out.&lt;/p&gt;
&lt;p&gt;Without that note, the same questions get asked twice. Someone reruns a query I already checked. The review gets delayed for the wrong reason. By the time the issue is fixed, the cause and prevention item have already started to fade.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Open the note as soon as the incident affects a real meeting, KPI, or decision.&lt;/li&gt;
&lt;li&gt;Put impact, owner, next update time, and the current dashboard-safe decision at the top so nobody has to search for the current state.&lt;/li&gt;
&lt;li&gt;Log checks in the order I run them, with one timestamped evidence line per check.&lt;/li&gt;
&lt;li&gt;Keep &lt;code&gt;hypothesis&lt;/code&gt; separate from &lt;code&gt;confirmed cause&lt;/code&gt; so the note stays honest while the investigation is still moving.&lt;/li&gt;
&lt;li&gt;Record the operating decision explicitly: hold the number, use yesterday’s settled snapshot, delay the review, or republish after the fix.&lt;/li&gt;
&lt;li&gt;Close the note with one prevention item and one owner before I call the incident done.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If the run itself is late, I work from &lt;a href=&quot;/blog/posts/pipeline-observability-before-more-orchestration/&quot;&gt;the five-signal observability panel for late analytics runs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If the data landed but the number changed, I use &lt;a href=&quot;/blog/posts/dashboard-number-changes/&quot;&gt;When a dashboard number changes, I check these four things first&lt;/a&gt; to walk through freshness, row counts, joins, and metric logic.&lt;/p&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;This is the kind of live note I want open during that revenue incident:&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;




























































&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Field&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Incident&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;executive revenue KPI is +11.8% vs settled baseline&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Started&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;2025-12-09 08:05 ET&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Owner&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;analytics engineering&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Impact&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;08:30 ET leadership review is not safe to run from the live dashboard&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Current decision&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;use yesterday&apos;s settled snapshot until the dashboard is republished&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Next update&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;08:20 ET&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Timeline&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;08:05 ET alert from revenue dashboard and finance Slack thread
08:12 ET freshness check passes; finance extract landed at 07:11 ET
08:18 ET row-count check passes; fact_sales volume is within 0.8% of recent Mondays
08:27 ET join check fails; unmatched promotion keys jump from 0.4% to 14.2%
08:41 ET mapping fix deployed and model rebuilt
08:50 ET dashboard republished&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Checks run&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Freshness: ok — latest finance snapshot landed on time
Row counts: ok — fact_sales volume is stable
Joins: off — promotion mapping dropped a material set of rows
Metric logic: unchanged — no dashboard filter or definition edit since yesterday&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Hypothesis&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;New promotion override codes were not included in the morning dimension sync.&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Confirmed cause&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;The morning dim_promotions sync excluded new override codes from the ERP feed.&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Fix&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Backfill the missing promotion codes, rebuild the model, and republish the dashboard.&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Prevention&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;Add an unmatched-key alert on the promotion join before the next finance review.&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Prevention owner&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;analytics engineering&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;That note is enough for the incident I am running. I can answer status questions, keep the operating decision visible, and avoid rebuilding the same timeline later.&lt;/p&gt;
&lt;p&gt;I care less about the document itself than whether impact, evidence, decision, and prevention sit in one place while the KPI is still moving. If I already have the pipeline checks in place, this note becomes the record of what I actually saw, ruled out, and decided.&lt;/p&gt;
&lt;p&gt;The headings stay simple on purpose: impact, owner, next update, timeline, checks run, hypothesis, confirmed cause, fix, and prevention. If a field does not change the next action or preserve evidence, I leave it out.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: the note gets opened after the fix and turns into reconstructed memory → Mitigation: start the note on the first real check, even if the first version is only four lines.&lt;/li&gt;
&lt;li&gt;Breaks when: the team writes paragraphs instead of evidence lines → Mitigation: keep one bullet per timestamp and one line per check.&lt;/li&gt;
&lt;li&gt;Breaks when: a hypothesis hardens into a fake root cause because it was written too early → Mitigation: keep separate headings for &lt;code&gt;hypothesis&lt;/code&gt; and &lt;code&gt;confirmed cause&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Breaks when: the incident closes with no prevention owner → Mitigation: require one named follow-up item before marking the incident complete.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Pick one business-critical KPI and write the headers for your live incident note before the next morning review turns noisy.&lt;/p&gt;
&lt;p&gt;If the next analytics incident would still be reconstructed from chat, use one responder artifact—the timestamp, check, decision, and owner line—to make the first review easier.&lt;/p&gt;</content:encoded><category>operations</category><category>analytics-engineering</category><category>data-reliability</category></item><item><title>Five pipeline observability signals before more orchestration</title><link>https://berhanturkkaynagi.com/blog/posts/pipeline-observability-before-more-orchestration/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/pipeline-observability-before-more-orchestration/</guid><description>I use five signals—source freshness, latest publish, runtime, output shape, and owner—to make a missed analytics pipeline cutoff explainable before I add more orchestration.</description><pubDate>Sat, 07 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;At &lt;code&gt;07:34&lt;/code&gt;, I do not need another orchestration knob. I need signals that tell me whether the &lt;code&gt;08:00 ET&lt;/code&gt; review is still recoverable.&lt;/p&gt;
&lt;p&gt;Before I add more orchestration, I want a small operating view: Did the source land? What was the latest successful publish? Did runtime drift? Does the output still look normal? Who owns the first check?&lt;/p&gt;
&lt;p&gt;I keep that view separate from &lt;a href=&quot;/blog/posts/row-counts-are-not-enough/&quot;&gt;Row counts are not enough: the checks I add before I trust a pipeline&lt;/a&gt; because those checks live inside the data path. This panel is for the moment the publish is late, stale, or failed and I need the next investigation step fast.&lt;/p&gt;
&lt;p&gt;If the panel says the publish completed and the output still looks wrong downstream, I move to &lt;a href=&quot;/blog/posts/dashboard-number-changes/&quot;&gt;When a dashboard number changes, I check these four things first&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Imagine a daily &lt;code&gt;inventory_availability&lt;/code&gt; pipeline that needs to publish by &lt;code&gt;07:30 ET&lt;/code&gt; for an &lt;code&gt;08:00 ET&lt;/code&gt; operations review.&lt;/p&gt;
&lt;p&gt;At &lt;code&gt;07:34&lt;/code&gt;, the orchestrator shows a failed run. That matters, but it is not enough. I still need to know whether the source extract was late, whether the curated model slowed down, whether the latest publish is still yesterday, and whether there is a clear owner for the first investigation step.&lt;/p&gt;
&lt;p&gt;When those answers are missing, the team starts shopping for another retry rule or dependency feature. I would rather make the missed run legible first.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Start with one business-critical pipeline and one real business cutoff time.&lt;/li&gt;
&lt;li&gt;Show one upstream handoff signal, usually source freshness with the latest landed timestamp.&lt;/li&gt;
&lt;li&gt;Show one publish-completion signal, usually the latest successful publish or latest published partition timestamp.&lt;/li&gt;
&lt;li&gt;Track run duration against a normal band, not just success versus failure.&lt;/li&gt;
&lt;li&gt;Add one output-shape signal on the published model, such as row-count delta or unmatched-key rate.&lt;/li&gt;
&lt;li&gt;Put the owner and the first runbook question in the same view so the handoff starts immediately.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I do not want a wall of task states. I want a boundary view. Each signal should change either the next investigation step or my confidence in the published output.&lt;/p&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;Here is the five-signal panel I want for that &lt;code&gt;inventory_availability&lt;/code&gt; run. The companion &lt;a href=&quot;/blog/table/#pipeline-observability-signals&quot;&gt;table view of the five-signal panel&lt;/a&gt; shows the same review surface:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Source extract freshness — expected &lt;code&gt;landed by 06:10 ET&lt;/code&gt;; observed &lt;code&gt;06:08 ET&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Latest successful publish — expected &lt;code&gt;today by 07:30 ET&lt;/code&gt;; observed &lt;code&gt;2025-12-01 07:14 ET&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Curated model duration — expected &lt;code&gt;12–15 minutes&lt;/code&gt;; observed &lt;code&gt;41 minutes&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Output row-count delta — expected &lt;code&gt;within +/-2%&lt;/code&gt;; observed &lt;code&gt;+0.4%&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Owner + first check — expected &lt;code&gt;platform analytics&lt;/code&gt;; observed &lt;code&gt;inspect recent model changes and step runtime&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That pattern tells me where to start.&lt;/p&gt;
&lt;p&gt;The source landed on time, so I do not start with ingestion. The latest successful publish is still yesterday, so today’s data did not make it across the finish line. The output row-count delta is stable, so this does not look like a missing partition or obvious shape break. The inflated duration points me at the transform or publish path before I waste time debating retries, dependencies, or dashboard logic.&lt;/p&gt;
&lt;p&gt;The audit query behind that panel is small on purpose:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;select&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;  run_date,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;  source_landed_at,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;  published_at,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;  duration_minutes,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;  output_row_count&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; pipeline_run_audit&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;where&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; pipeline_name &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#032F62;--shiki-dark:#9ECBFF&quot;&gt; &apos;inventory_availability&apos;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;order by&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; run_date &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;desc&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;limit&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; 7&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I am not trying to build a perfect monitoring product. I want enough history to compare today’s run to the normal band, assign the first check cleanly, and keep the incident moving.&lt;/p&gt;
&lt;p&gt;That is why I keep this operating view separate from the pipeline checks themselves. The checks live inside the data path. This panel is for the moment a run is late or fails and I need the next investigation step fast.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: teams instrument every task in the DAG and bury the useful signals under noise → Mitigation: start with one pipeline tied to one business deadline and keep only the signals that change the first investigation step.&lt;/li&gt;
&lt;li&gt;Breaks when: the orchestration UI shows task state but nothing about the published dataset → Mitigation: pair workflow state with one publish-completion signal and one business-output signal.&lt;/li&gt;
&lt;li&gt;Breaks when: one platform-wide SLA hides different lateness patterns across feeds → Mitigation: define freshness windows per dataset and per business expectation, not as one generic cutoff.&lt;/li&gt;
&lt;li&gt;Breaks when: alerts fire but nobody knows who owns the fix or what to check first → Mitigation: put the owner and the first runbook question in the same panel.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Pick one business-critical pipeline and write down the five signals you need before the next missed SLA turns into a meeting.&lt;/p&gt;
&lt;p&gt;If a cutoff was missed recently, use one pipeline’s five-signal panel to decide whether the next fix is ownership, freshness, publication evidence, or orchestration work.&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>observability</category><category>data-pipelines</category><category>data-platforms</category><category>operations</category></item><item><title>How I validate a metric after a backfill</title><link>https://berhanturkkaynagi.com/blog/posts/metric-definition-survives-backfill/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/metric-definition-survives-backfill/</guid><description>I validate a rebuilt metric with a short backfill note, slice-by-slice comparison, and a published restatement boundary before I republish the dashboard.</description><pubDate>Sat, 07 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Before I rerun a backfill, I write down the slices that should move and the ones that should stay flat. That short note tells me whether the rebuild is correcting history or merely moving numbers around.&lt;/p&gt;
&lt;p&gt;When I validate a metric after a backfill, I start with the blast radius, not the total: which months, cohorts, or plans should move, which ones should stay flat, and what I need to explain in &lt;a href=&quot;/blog/posts/bi-dashboard-release-checks/&quot;&gt;the BI release checklist before a dashboard change goes live&lt;/a&gt; before the dashboard returns to production.&lt;/p&gt;
&lt;p&gt;If metric meaning or row meaning is still fuzzy, a backfill exposes it fast.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Imagine I fix a billing bug that misclassifies invoice reversals in monthly subscription revenue. The code fix is small. The risky part comes next: I need to backfill the last 90 days and republish finance dashboards that leadership already used in prior reviews.&lt;/p&gt;
&lt;p&gt;A 1.8% move in February is not automatically good or bad. The real question is whether the movement lands where I expected. If I cannot say which months, plans, or customer cohorts should change before I start the rebuild, I am not validating a metric. I am rerolling history and hoping the new total looks more credible.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Freeze the metric definition, owner, and grain before I rerun anything.&lt;/li&gt;
&lt;li&gt;Write a short backfill note that names the bug, the date window, the affected tables and dashboards, the expected movers, and the expected non-movers.&lt;/li&gt;
&lt;li&gt;Snapshot the pre-backfill result so I can compare before and after instead of relying on memory, screenshots, or dashboard cache.&lt;/li&gt;
&lt;li&gt;Compare the rebuild by stable slices such as billing month, plan type, or customer cohort, not just the top-line total.&lt;/li&gt;
&lt;li&gt;Investigate two kinds of surprises: slices that moved when they should have stayed flat, and slices that stayed flat when they should have moved.&lt;/li&gt;
&lt;li&gt;Publish the restatement boundary in one short note so downstream users know the rebuilt window, the expected movers, the expected non-movers, and when the rebuilt numbers become final.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That sequence gives me an explanation before I republish anything. It also helps me separate a real metric fix from unrelated model drift.&lt;/p&gt;
&lt;h2 id=&quot;example-a-90-day-revenue-backfill&quot;&gt;Example: a 90-day revenue backfill&lt;/h2&gt;
&lt;p&gt;Before I start the rebuild, I want a short note like this:&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;








































&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Field&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Metric&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;subscription_revenue_monthly&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Owner&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;finance analytics&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Grain&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;one billed_account_id per billing_month&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Reason for backfill&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;invoice reversals were excluded from the monthly revenue adjustment logic&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Backfill window&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;2025-08-01 through 2025-10-31&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Expected movers&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;monthly plans with reversal activity
customer cohorts billed inside the 90-day window&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Expected non-movers&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;annual prepaid plans
free plans
months before 2025-08&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Dashboards affected&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;finance MRR review
monthly retention pack&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then I compare before and after by a stable slice:&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;




























































&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;billing_month&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;plan_type&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;revenue_before&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;revenue_after&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;delta&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;expected&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;2025-08&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;monthly&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;182400&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;190900&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;+8500&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;yes&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;2025-08&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;annual&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;264000&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;264000&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;0&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;no&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;2025-09&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;monthly&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;188100&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;195700&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;+7600&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;yes&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;2025-09&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;annual&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;271500&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;274900&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;+3400&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;no&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;2025-10&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;monthly&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;191800&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;199600&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;+7800&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;yes&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;2025-10&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;annual&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;279200&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;279200&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;0&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;no&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The restatement boundary I want published beside the dashboard is short:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Restatement notice&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- rebuilt window: 2025-08-01 through 2025-10-31&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- expected movers: monthly plans with reversal activity inside that window&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- expected non-movers: annual prepaid plans, free plans, months before 2025-08&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- numbers outside this boundary should remain unchanged&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;- dashboard stays unpublished until unexpected movement is explained&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The top-line move is directionally right. The rebuild is still not validated. September annual-plan revenue changed even though annual plans were outside the bug. That one mismatch is enough for me to stop and investigate.&lt;/p&gt;
&lt;p&gt;I keep the comparison query simple on purpose:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;with&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; before_backfill &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;  select&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;    billing_month,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;    plan_type,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;    sum&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(revenue_usd) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; revenue_before&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;  from&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; finance&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;subscription_revenue_monthly_before&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;  group by&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; 1&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;2&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;after_backfill &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;  select&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;    billing_month,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;    plan_type,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;    sum&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(revenue_usd) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; revenue_after&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;  from&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; finance&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;subscription_revenue_monthly&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;  group by&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; 1&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;2&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;select&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;  coalesce&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;a&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;billing_month&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;b&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;billing_month&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; billing_month,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;  coalesce&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;a&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;plan_type&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;b&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;plan_type&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; plan_type,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;  coalesce&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;b&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;revenue_before&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; revenue_before,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;  coalesce&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;a&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;revenue_after&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; revenue_after,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;  coalesce&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;a&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;revenue_after&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; coalesce&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;b&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;revenue_before&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; delta&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; before_backfill b&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;full outer join&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; after_backfill a&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;  on&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; a&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;billing_month&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; =&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; b&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;billing_month&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; and&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; a&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;plan_type&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; =&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; b&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;plan_type&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;order by&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; 1&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;2&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If I see that unexpected annual delta, I do not republish the dashboard yet. My next check is whether a dimension change or a plan-mapping fix got bundled into the same release, then I validate that unexpected slice on its own before I reopen the dashboard. That is how a clean metric correction gets confused with a wider model change.&lt;/p&gt;
&lt;p&gt;This is why I do not validate a metric after a backfill by comparing totals alone. A metric can land close to the expected overall number and still move the wrong cohort, the wrong plan, or the wrong time period.&lt;/p&gt;
&lt;p&gt;If the backfill still produces unexplained movement after this comparison, I treat it like a fresh reliability problem. At that point I go back to &lt;a href=&quot;/blog/posts/row-counts-are-not-enough/&quot;&gt;the checks I add before I trust a pipeline&lt;/a&gt; or &lt;a href=&quot;/blog/posts/analytics-incident-note-template/&quot;&gt;the incident note template I use when a dashboard number changes&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: source history is incomplete, overwritten, or missing key fields needed to reconstruct the old logic → Mitigation: make the reconstruction limit explicit and say which periods or slices are no longer fully trustworthy.&lt;/li&gt;
&lt;li&gt;Breaks when: a slowly changing dimension or mapping rule changed in the same deployment as the metric fix → Mitigation: isolate the logic repair from the dimensional change, or validate each one with its own before-and-after comparison.&lt;/li&gt;
&lt;li&gt;Breaks when: the team only checks the total delta and never checks the shape of the change → Mitigation: compare by stable slices such as month, cohort, plan, or region before republishing dashboards.&lt;/li&gt;
&lt;li&gt;Breaks when: downstream users assume historical numbers never move after month-end → Mitigation: publish a restatement window and label which outputs are settled versus still allowed to change.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; For one metric that can be restated, write the dates, cohorts, and dashboards you expect to move if you backfill the last 90 days.&lt;/p&gt;
&lt;p&gt;That expected-movers list gives the rerun a boundary before the total delta becomes the whole argument.&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>metrics</category><category>backfills</category><category>data-reliability</category></item><item><title>Every important model needs an explicit grain</title><link>https://berhanturkkaynagi.com/blog/posts/every-important-model-needs-an-explicit-grain/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/every-important-model-needs-an-explicit-grain/</guid><description>How I keep models trustworthy with a simple grain note: row meaning, expected key, safe joins, and the first duplication test I write.</description><pubDate>Sat, 07 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A model can look clean in SQL and still be unsafe to use. If nobody wrote down what one row means, the first many-to-many join turns a reasonable table into an argument.&lt;/p&gt;
&lt;p&gt;I treat grain as a trust boundary, not a documentation chore. Before I trust a revenue sum or a dashboard card, I want one line that says what the row represents, which key should be unique, and which joins preserve that meaning.&lt;/p&gt;
&lt;p&gt;If the row meaning is fuzzy here, &lt;a href=&quot;/blog/posts/stop-metric-drift-across-dashboards/&quot;&gt;the definition work that stops metric drift across dashboards&lt;/a&gt; starts from the wrong model. If the source rules are fuzzy too, I start earlier with a minimum data contract: expected keys, lateness window, and the rule for bad records.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Imagine &lt;code&gt;fct_customer_orders&lt;/code&gt; is meant to hold one row per completed &lt;code&gt;order_id&lt;/code&gt;. A dashboard request comes in for revenue by marketing channel, so someone joins that model directly to &lt;code&gt;fct_sessions&lt;/code&gt; on &lt;code&gt;customer_id&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The query runs. The chart even looks plausible. But one customer can have several sessions before one order, so booked revenue gets repeated across joined rows. The SQL is only part of the problem. The deeper issue is that the team cannot say, in one sentence, what the model is supposed to preserve.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Write the grain in one line: one row equals which business entity, at which time boundary, and what state or event qualifies it.&lt;/li&gt;
&lt;li&gt;Name the key or key combination that should be unique at that grain, and what should happen when duplicates appear.&lt;/li&gt;
&lt;li&gt;List the joins that preserve the grain and the ones that require reshaping or pre-aggregation first.&lt;/li&gt;
&lt;li&gt;Mark which measures are safe to sum, count, or average from the model.&lt;/li&gt;
&lt;li&gt;Add one test that fails when the declared grain starts duplicating.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That is enough context to block most accidental many-to-many joins before they ship.&lt;/p&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;This is the lightweight grain note I would want next to a business-critical order model:&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;




































&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Field&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Model&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;fct_customer_orders&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Declared grain&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;one row = one completed order_id&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Expected unique key&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;order_id&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Safe joins&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;dim_customers on customer_id (many orders to one customer)
dim_dates on order_date (many orders to one date)&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Unsafe without reshaping first&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;fct_sessions on customer_id
ad_clicks on customer_id&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Safe measures from this model&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;sum(revenue_usd)
count(order_id)
avg(order_value_usd)&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;First grain test&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;fail if count(*) != count(distinct order_id)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now imagine the data looks like this:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;fct_customer_orders&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;



















&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;order_id&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;customer_id&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;revenue_usd&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;1001&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;C42&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;120&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;1002&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;C77&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;80&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;fct_sessions&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;





























&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;session_id&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;customer_id&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;channel&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;s1&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;C42&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;paid&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;s2&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;C42&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;email&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;s3&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;C42&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;organic&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;s4&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;C77&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;paid&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;A direct join can look innocent:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;select&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;  s&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;channel&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;  sum&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;o&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;revenue_usd&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; revenue_usd&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; fct_customer_orders o&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;join&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; fct_sessions s&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;  on&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; o&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;customer_id&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; =&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; s&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;customer_id&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;group by&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; 1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If I run that, order &lt;code&gt;1001&lt;/code&gt; shows up three times because customer &lt;code&gt;C42&lt;/code&gt; had three sessions. Revenue becomes &lt;code&gt;440&lt;/code&gt; instead of &lt;code&gt;200&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The safer move is to reshape the session side to order grain before I bring it onto the order model. I pick one attribution rule, build a helper model with one row per &lt;code&gt;order_id&lt;/code&gt;, and join that helper back to orders. Once both sides share grain, the revenue sum is safe again.&lt;/p&gt;
&lt;p&gt;If row counts stay roughly stable while duplicates sneak into the declared key, I fall back to &lt;a href=&quot;/blog/posts/row-counts-are-not-enough/&quot;&gt;the check set I use before I trust a pipeline&lt;/a&gt;. Stable volume does not protect me from grain drift.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: a legacy model already mixes order, order-line, and session logic in one table → Mitigation: write the current grain note first, mark unsafe measures clearly, and split the highest-risk use case into a cleaner model before attempting a full rewrite.&lt;/li&gt;
&lt;li&gt;Breaks when: source keys are unstable and duplicates are normal upstream behavior → Mitigation: quarantine raw duplicates, state that the curated model is not yet authoritative, and tighten the source rule before downstream teams start summing it.&lt;/li&gt;
&lt;li&gt;Breaks when: one business question genuinely needs two grains, such as order conversion and session behavior in the same analysis → Mitigation: keep separate models for each grain and join only after pre-aggregating both sides to the decision grain.&lt;/li&gt;
&lt;li&gt;Breaks when: code review checks SQL syntax and row counts but never checks key duplication → Mitigation: add one uniqueness test on the declared grain and treat failures there as trust failures, not minor cleanup.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Pick one business-critical model and write down the grain, expected key, and one join you would block before the next pull request touches it.&lt;/p&gt;
&lt;p&gt;Which model on your team is one quiet rename away from a mixed-grain row that joins still trust?&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>data-modeling</category></item><item><title>The definition card I use to stop metric drift across dashboards</title><link>https://berhanturkkaynagi.com/blog/posts/stop-metric-drift-across-dashboards/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/stop-metric-drift-across-dashboards/</guid><description>When one metric starts arguing with itself across dashboards, I use a definition card to decide whether the team needs one owned definition or separate named metrics.</description><pubDate>Sat, 07 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;When the same metric starts arguing with itself across dashboards, I do not start with the chart. I start with the definition.&lt;/p&gt;
&lt;p&gt;One number should mean one thing for one decision. If finance, product, and operations need different answers, I would rather name those differences early than let one label drift across three dashboards.&lt;/p&gt;
&lt;p&gt;Before I compare dashboards, I want one owned definition in writing and one canonical card that every dashboard can point back to.&lt;/p&gt;
&lt;p&gt;If freshness, row loss, or join behavior are still in doubt, I run the pipeline checks and &lt;a href=&quot;/blog/posts/dashboard-number-changes/&quot;&gt;the dashboard triage sequence I run first&lt;/a&gt; before I call it metric drift.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;A metric can drift without any pipeline break. The data lands on time, row counts look normal, and the dashboard still tells two different stories.&lt;/p&gt;
&lt;p&gt;Imagine a monthly review where finance shows 18,240 active customers and product shows 24,910. If both charts are labeled &lt;code&gt;active_customer&lt;/code&gt;, the meeting turns into a debate about whose dashboard is wrong. Most of the time, the real problem is older than the chart: nobody wrote down the owner, grain, exclusions, and change history in one place.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Tie the metric to one decision and one owner before I call it reusable.&lt;/li&gt;
&lt;li&gt;Write the grain in one line: what one counted entity represents and over what time window. If the underlying model still has ambiguous row meaning, I fix that first in &lt;a href=&quot;/blog/posts/every-important-model-needs-an-explicit-grain/&quot;&gt;Every important model needs an explicit grain&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Write the inclusion and exclusion rules in plain language, not just SQL, so a reviewer can tell what the metric keeps out.&lt;/li&gt;
&lt;li&gt;Publish one canonical definition location, even if the dashboards live in different tools. Keep it next to the owned metric definition, semantic layer, or dashboard review path where people already verify the number, then point each dashboard subtitle, review doc, or release note back to that same card.&lt;/li&gt;
&lt;li&gt;Keep a short change log for metric logic changes before they ship, including the date, owner, what changed, why it matters for the decision, and which dashboards need the update.&lt;/li&gt;
&lt;li&gt;Split the metric into differently named versions when two teams need different logic.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;Here is the definition card I want every dashboard to point back to once a metric shows up in more than one dashboard:&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;
















&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Field&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Dashboard label in use today&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;active_customer&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Canonical definition card&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;metrics/active_customer&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;


































&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Record&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;decision&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;owner&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;grain&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;includes&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;excludes&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;source&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;last change&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;finance_active_customer_monthly&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;monthly revenue retention review&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;finance analytics&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;one billed_account_id per calendar month&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;account has at least one paid invoice in the month&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;trial-only accounts, fully refunded invoices&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;fct_invoice_monthly&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;2025-10-12 excluded fully refunded invoices&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;product_active_customer_28d&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;product engagement planning&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;product analytics&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;one account_id with activity in the trailing 28 days&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;account has at least one core feature event&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;internal test accounts&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;fct_product_activity_daily&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;2025-10-28 switched from session_start to core_feature_event&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I do not leave that card floating in a wiki page nobody opens. I keep one canonical copy next to the owned metric definition or dashboard review path, then make the dashboard point back to that card in the subtitle, release note, or review doc. Each change entry records the date, owner, what changed, why it matters for the decision, and which dashboards still need to move.&lt;/p&gt;
&lt;p&gt;Finance and product are not disagreeing about arithmetic. They are answering different questions under the same label.&lt;/p&gt;
&lt;p&gt;Once I see that, I stop asking which dashboard is right. The first decision is whether we need two metrics or one owned definition. If leadership truly needs one shared &lt;code&gt;active_customer&lt;/code&gt;, I pick the decision first, write one owned definition, and retire the competing label. I do not average two dashboard calculations into a compromise.&lt;/p&gt;
&lt;p&gt;Once that shared definition changes, I still want to control the rollout and any historical rebuild separately. If the new definition is ready to ship across live dashboards, I want a short release check—labels, filters, and cutover note—before I trust the rollout. If the change also restates history, &lt;a href=&quot;/blog/posts/metric-definition-survives-backfill/&quot;&gt;How I validate a metric after a backfill&lt;/a&gt; is the next check I use before I trust the rebuilt number.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: one label is forced to serve incompatible decisions across teams → Mitigation: split the metric into separately named definitions and assign an owner to each one.&lt;/li&gt;
&lt;li&gt;Breaks when: the definition exists in a doc but never gets updated when the logic changes → Mitigation: keep the change log next to the model or dashboard review path and require it in release checks.&lt;/li&gt;
&lt;li&gt;Breaks when: a legacy metric already appears in too many dashboards to rename in one pass → Mitigation: declare one canonical definition, note the cutoff date, and migrate the highest-risk dashboards first.&lt;/li&gt;
&lt;li&gt;Breaks when: a team buys semantic-layer tooling before it agrees on owner, grain, and exclusions → Mitigation: start with a small definition card for one business-critical metric and expand only after the rules hold.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Pick one metric that appears in more than one dashboard and write down the owner, grain, time window, inclusion rules, exclusions, and last logic change before the next review.&lt;/p&gt;
&lt;p&gt;If your team is untangling metric drift right now, which metric label is still carrying two different definitions into the same review?&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>metrics</category><category>semantic-layer</category></item><item><title>When a dashboard number changes, I check these four things first</title><link>https://berhanturkkaynagi.com/blog/posts/dashboard-number-changes/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/dashboard-number-changes/</guid><description>When a dashboard number moves without warning, I isolate the failing layer—freshness, row counts, joins, or metric logic—before debating the chart.</description><pubDate>Fri, 06 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;When a dashboard number moves without warning, I need the first answer fast: which layer changed—freshness, row counts, joins, or metric logic?&lt;/p&gt;
&lt;p&gt;I do not start by defending the number. I start by isolating the layer that changed.&lt;/p&gt;
&lt;p&gt;That first pass stays in this order—freshness, row counts, joins, and metric logic—because each check rules out a whole class of failure quickly.&lt;/p&gt;
&lt;p&gt;It breaks when there are no baselines or no clear owner. In that case I say that early and create those baselines before I promise a final answer.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Imagine available inventory is down 18% on Monday morning. The dashboard may be reacting to a late source, lost model rows, a join that dropped valid records, or a metric-logic change.&lt;/p&gt;
&lt;p&gt;Those are different failure modes. If I treat them as one problem, the incident gets slower and the conversation gets noisy.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Check freshness first: did the source land on time, did the job finish cleanly, and is the latest successful publish recent enough to trust?&lt;/li&gt;
&lt;li&gt;Check row counts next: did the main model lose or gain more volume than expected versus recent baselines?&lt;/li&gt;
&lt;li&gt;Check join behavior after that: did a dimension change create duplicate, unmatched, or mis-mapped rows?&lt;/li&gt;
&lt;li&gt;Check metric logic last: did someone change filters, time windows, exclusions, or semantic rules on purpose, and did that change go through a dashboard release check before it hit production?&lt;/li&gt;
&lt;li&gt;Send a short status update once I know which layer is failing and what I am checking next.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After the first pass, I send a short update so the conversation stays calm and scoped. This is the copy/paste template I use:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;txt&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Status: investigating &amp;#x3C;metric&gt; &amp;#x3C;up/down X%&gt; vs &amp;#x3C;baseline&gt; (as of &amp;#x3C;time&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Freshness: &amp;#x3C;ok/late&gt; — &amp;#x3C;one evidence line&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Row counts: &amp;#x3C;ok/off&gt; — &amp;#x3C;one evidence line&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Joins: &amp;#x3C;ok/off&gt; — &amp;#x3C;one evidence line&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Metric logic: &amp;#x3C;ok/changed&gt; — &amp;#x3C;one evidence line&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Hypothesis: &amp;#x3C;one sentence&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Next check: &amp;#x3C;one sentence&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Next update: &amp;#x3C;time&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When the issue affects a meeting or decision, I keep &lt;a href=&quot;/blog/posts/analytics-incident-note-template/&quot;&gt;The incident note template I wish every analytics team used&lt;/a&gt; open beside these checks so the timeline, evidence, and next update stay in one place.&lt;/p&gt;
&lt;p&gt;If freshness, row counts, joins, and metric logic all come back clean and the disagreement remains, I stop treating it like a pipeline incident. My next check is the owned metric definition and any recent BI release evidence.&lt;/p&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;Here is a simple version of the sequence for that Monday inventory drop:&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;












&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Field&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Symptom&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;available inventory is down 18%&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;





























&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Review step&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;What I confirm&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Status&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Freshness&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;inventory snapshot extract completed at 07:12 ET
latest snapshot date matches expectation&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;—&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Row counts&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;stg_inventory_snapshot row count is down 1.4% vs recent Mondays
not enough to explain the full drop&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;—&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Join behavior&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;unmatched location_id values jump from 0.3% to 14.8%
available units disappear after the join to location attributes&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;—&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Metric logic&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;no intentional dashboard filter or definition change&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;—&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Conclusion: the issue is missing location mappings, not a true inventory decline&lt;/p&gt;
&lt;p&gt;The query I want ready for this step is usually simple:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;select&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;  count&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;*&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; total_rows,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;  sum&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;case&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; when&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; l&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;location_id&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; is&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; null&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; then&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; 1&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; else&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; 0&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; end&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; unmatched_rows&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; fct_inventory_snapshot i&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;left join&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; dim_locations l&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;  on&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; i&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;location_id&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; =&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; l&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;location_id&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That sequence usually tells me which layer is failing in one pass. I like checks like this because I can run them at 08:05 and explain the result in one calm Slack update.&lt;/p&gt;
&lt;p&gt;The controls I want in place are boring on purpose: load completion time, row-count thresholds, and one unmatched-key check on the model that feeds the dashboard. Boring is good during an incident.&lt;/p&gt;
&lt;p&gt;If freshness is bad because the run missed its cutoff, I move to &lt;a href=&quot;/blog/posts/pipeline-observability-before-more-orchestration/&quot;&gt;the five-signal observability panel for missed analytics cutoffs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If the four checks are clean and the disagreement remains, I move to &lt;a href=&quot;/blog/posts/stop-metric-drift-across-dashboards/&quot;&gt;The definition card I use to stop metric drift across dashboards&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: there is no baseline for freshness or row-count shifts → Mitigation: start logging daily load times and row counts for critical models, even if the first version is manual.&lt;/li&gt;
&lt;li&gt;Breaks when: the metric definition lives in multiple places across SQL, dbt, and BI → Mitigation: choose one owned definition and point dashboards back to it.&lt;/li&gt;
&lt;li&gt;Breaks when: late-arriving data is normal for the business → Mitigation: compare against the same latency window instead of the final settled number.&lt;/li&gt;
&lt;li&gt;Breaks when: nobody owns the upstream model or dashboard metric → Mitigation: assign one owner for the next incident before the memory of this one fades.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Before the next incident, pick one important metric and write down the freshness, row-count, join, and logic checks you expect before anyone asks questions about it.&lt;/p&gt;
&lt;p&gt;If the next KPI move would still send everyone to Slack first, use that four-check order as the first triage note before rewriting SQL.&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>data-quality</category><category>data-pipelines</category></item><item><title>Row counts are not enough: the checks I add before I trust a pipeline</title><link>https://berhanturkkaynagi.com/blog/posts/row-counts-are-not-enough/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/row-counts-are-not-enough/</guid><description>The five checks I use before I trust a pipeline: freshness, row counts, uniqueness, null rates, and one business-shape check that tells me whether the published data is safe to use.</description><pubDate>Fri, 06 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Row counts are a smoke test, not a pass. I have seen models land at the usual volume while a join key turns null, duplicates creep in, or one business category disappears without tripping the row-count alert.&lt;/p&gt;
&lt;p&gt;Before I trust a published model, I want five checks inside the publish path: freshness, row counts, uniqueness, null rate, and one shape check tied to business risk.&lt;/p&gt;
&lt;p&gt;These five checks live inside the publish path, so they are a different job from the operating view I watch when a run is already late or stale. If these five checks pass and a number still moves where the business can see it, I switch to &lt;a href=&quot;/blog/posts/dashboard-number-changes/&quot;&gt;When a dashboard number changes, I check these four things first&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If a check cannot change a response, it usually does not earn a slot.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Imagine &lt;code&gt;fct_shipments&lt;/code&gt; still lands about 1.2 million rows after an upstream schema change. At first glance the pipeline looks healthy.&lt;/p&gt;
&lt;p&gt;But the source team renamed a warehouse mapping field, the transform still runs, and 22% of rows now carry a null &lt;code&gt;warehouse_id&lt;/code&gt;. The row count did its job; it just did not protect the downstream metric.&lt;/p&gt;
&lt;p&gt;That kind of miss shows up later as a regional fill-rate problem, a missing warehouse view, or a dashboard nobody trusts. I would rather catch it in the pipeline than explain it in a meeting.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Check freshness first: did the extract land on time, did the model finish when I expected, and is the latest successful publish recent enough to trust?&lt;/li&gt;
&lt;li&gt;Check row counts next to rule out a big volume shift. If the count looks normal, keep going. Stable volume is not a pass when keys can go null, duplicates can creep in, or one category can quietly disappear.&lt;/li&gt;
&lt;li&gt;Check uniqueness on the keys that drive downstream joins, mappings, or counts.&lt;/li&gt;
&lt;li&gt;Check null rate on fields that would break joins, mappings, or business logic if they go missing.&lt;/li&gt;
&lt;li&gt;Check one accepted-values or distribution rule on a high-risk column so I catch shape changes, not just missing rows.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After those checks, each one needs an owner and a first response step.&lt;/p&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;Here is the kind of failure I want the pipeline to catch before a dashboard does:&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;












&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Field&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Model&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;fct_shipments&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;




























&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Record&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;row count&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;shipment_id&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;warehouse_id null rate&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;status values&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;regional split&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Expected daily shape:&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;~1.2M&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;unique&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;&amp;#x3C; 0.5%&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;shipped, cancelled, corrected&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;east 34%, central 29%, west 37%&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;After upstream schema change:&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;1.19M&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;still unique&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;22.4%&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;unchanged&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;west volume collapses because null warehouse_id cannot map to region&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I keep the first check query simple on purpose:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;select&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;  count&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;*&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; row_count,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;  count&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;distinct&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; shipment_id) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; distinct_shipment_id,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;  round&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;100&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; *&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; avg&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;case&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; when&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; warehouse_id &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;is&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; null&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; then&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; 1&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; else&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; 0&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; end&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;), &lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;2&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; warehouse_id_null_pct&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; fct_shipments;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;On very large tables, I run the distinct check on the newest partition or a rolling window, then schedule a deeper scan less often.&lt;/p&gt;
&lt;p&gt;Then I add one quick shape check next to it:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes github-light github-dark&quot; style=&quot;background-color:#fff;--shiki-dark-bg:#24292e;color:#24292e;--shiki-dark:#e1e4e8; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;select&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;  region,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt;  count&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;*&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;) &lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; shipments&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt; fct_shipments&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;group by&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; 1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt;order by&lt;/span&gt;&lt;span style=&quot;color:#005CC5;--shiki-dark:#79B8FF&quot;&gt; 2&lt;/span&gt;&lt;span style=&quot;color:#D73A49;--shiki-dark:#F97583&quot;&gt; desc&lt;/span&gt;&lt;span style=&quot;color:#24292E;--shiki-dark:#E1E4E8&quot;&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If the row count is stable but one region disappears, I do not need a long debate about whether the pipeline is healthy. I already know which mapping layer to inspect.&lt;/p&gt;
&lt;p&gt;If these checks pass and two dashboards still disagree, I stop debugging the pipeline and move to definition ownership. &lt;a href=&quot;/blog/posts/stop-metric-drift-across-dashboards/&quot;&gt;The definition card I use to stop metric drift across dashboards&lt;/a&gt; is the handoff I make next.&lt;/p&gt;
&lt;p&gt;I want a small number of business-aware checks, not a long catalog of generic ones. A null-rate check on &lt;code&gt;warehouse_id&lt;/code&gt; earns its place. Ten low-risk checks on columns nobody uses usually do not earn a slot.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: every column gets the same battery of checks → Mitigation: start with fields tied to joins, filters, revenue, inventory, or service-level reporting.&lt;/li&gt;
&lt;li&gt;Breaks when: thresholds are copied from another model with different behavior → Mitigation: set baselines from recent history and review them when the source changes.&lt;/li&gt;
&lt;li&gt;Breaks when: alerts fire but nobody knows the first investigation step → Mitigation: attach each check to an owner and a short runbook question such as “freshness, join, or mapping?”&lt;/li&gt;
&lt;li&gt;Breaks when: teams assume schema tests cover everything important → Mitigation: add one or two data-shape checks that reflect real business failure modes, not just table structure.&lt;/li&gt;
&lt;li&gt;Breaks when: full-table uniqueness checks are too expensive to run on every build → Mitigation: check uniqueness on the newest partition or a rolling window, and run a deeper scan on a slower cadence.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Pick one business-critical model and add one freshness check, one volume check, one uniqueness check, one null-rate check, and one business-shape check before the next source change lands.&lt;/p&gt;
&lt;p&gt;If a row-count check already passes while the business still distrusts the output, use one failed shape check from that model to decide which test deserves ownership first.&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>data-quality</category><category>data-pipelines</category><category>data-reliability</category></item><item><title>The 6-part data contract I want before I trust a source table</title><link>https://berhanturkkaynagi.com/blog/posts/minimum-data-contract-source-table/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/minimum-data-contract-source-table/</guid><description>My six-part source-table contract covers row meaning, key rule, landing cadence, valid-record handling, correction window, and owner so downstream analytics work does not start with guesses.</description><pubDate>Fri, 06 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A new source table can look useful long before it is safe to trust. If nobody agrees on row meaning, key rule, landing cadence, and valid-record handling, the first downstream model bakes in assumptions nobody approved.&lt;/p&gt;
&lt;p&gt;Before I add the checks from &lt;a href=&quot;/blog/posts/row-counts-are-not-enough/&quot;&gt;Row counts are not enough: the checks I add before I trust a pipeline&lt;/a&gt;, I want a short six-part source contract:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;row meaning&lt;/li&gt;
&lt;li&gt;key rule&lt;/li&gt;
&lt;li&gt;landing cadence&lt;/li&gt;
&lt;li&gt;valid-record handling&lt;/li&gt;
&lt;li&gt;correction window&lt;/li&gt;
&lt;li&gt;owner&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is small enough to agree on quickly and specific enough to stop downstream guessing.&lt;/p&gt;
&lt;p&gt;Even when the source is exploratory or changing fast, I still write the short version and label the trust boundary up front. If the row meaning is still fuzzy after that contract pass, I usually need &lt;a href=&quot;/blog/posts/every-important-model-needs-an-explicit-grain/&quot;&gt;an explicit grain note&lt;/a&gt; before the downstream model is safe to reuse.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Imagine a warehouse integration starts sending a &lt;code&gt;shipment_events&lt;/code&gt; table for on-time-in-full (OTIF) and fill-rate reporting. Monday’s load looks normal. By Tuesday, planners see duplicate shipment corrections in one lane and a small set of rows with no &lt;code&gt;warehouse_id&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;At that point, the analytics team can keep patching transforms around ambiguous source behavior. The better move is to stop and write the contract that says what a row means, which records are valid, and how long corrections can restate history. I have learned that this is cheaper than repairing the metric layer later.&lt;/p&gt;
&lt;h2 id=&quot;default-approach&quot;&gt;Default approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Define the row meaning first: one row should mean one shipment event, one shipment line, or one snapshot record, not a mix of all three.&lt;/li&gt;
&lt;li&gt;Define the key rule next: which column or column set should be unique, and when can retries or duplicates appear?&lt;/li&gt;
&lt;li&gt;Define landing cadence and lateness: how often should the table land, and when is it officially late?&lt;/li&gt;
&lt;li&gt;Define valid-record rules for critical fields and the response when they fail: quarantine, block publish, or allow with a flag.&lt;/li&gt;
&lt;li&gt;Define the correction window: can rows be updated or deleted, how are corrections flagged, and when does history stop moving?&lt;/li&gt;
&lt;li&gt;Define the owner: who confirms the rule and who answers first when a check fails?&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
&lt;p&gt;Here is how I would turn those six parts into a lightweight contract for a new &lt;code&gt;shipment_events&lt;/code&gt; table before I use it in executive reporting. I keep the contract in the same six-part order so the field-level rules line up with the actual checks.&lt;/p&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;
















&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Field&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Table&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;shipment_events&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;Use case&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;OTIF and fill-rate reporting&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;not-prose my-8 overflow-hidden rounded-2xl border border-black/10 bg-white dark:border-white/10 dark:bg-base-900&quot; data-inline-bearnie-table=&quot;&quot;&gt;&lt;div class=&quot;relative w-full overflow-auto&quot;&gt;
































































&lt;table class=&quot;w-full caption-bottom text-sm&quot;&gt;&lt;thead&gt;&lt;tr class=&quot;border-b border-black/10 dark:border-white/10&quot;&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Contract field&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Expectation&lt;/th&gt;&lt;th class=&quot;px-5 py-3 text-left align-top text-xs font-medium uppercase tracking-wide text-text-muted&quot; scope=&quot;col&quot;&gt;Why it matters&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;row meaning&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;one row = one shipment event&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;stops grain drift downstream&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;shipment_event_id&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;unique per source event&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;retry duplicates do not inflate counts&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;landing cadence&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;daily load by 06:00 UTC&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;tells me when freshness is late&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;event_ts&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;always present in UTC&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;defines lateness and daily bucketing&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;warehouse_id&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;non-null, valid mapped warehouse&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;joins do not silently drop records&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;sku&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;non-null product identifier&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;shipment and inventory logic can reconcile&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;quantity&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;signed numeric value, never null&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;corrections and reversals stay explainable&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;event_type&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;allowed set: ship, cancel, correct&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;event flow is explicit&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;invalid record action&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;quarantine rows missing warehouse_id or sku&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;bad records do not leak into KPI tables&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;change behavior&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;corrections may arrive within 72 hours&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;downstream models know history can move&lt;/td&gt;&lt;/tr&gt;&lt;tr class=&quot;border-b border-black/10 transition-colors last:border-b-0 dark:border-white/10&quot;&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted font-medium text-black dark:text-white&quot;&gt;owner&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;WMS integration team&lt;/td&gt;&lt;td class=&quot;px-5 py-4 align-top whitespace-pre-line text-text-muted&quot;&gt;someone can confirm or fix the rule&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The questions I want settled are simple: can retries duplicate raw events, when is the daily load officially late, how long can corrections restate history, and do rows without &lt;code&gt;warehouse_id&lt;/code&gt; get quarantined or allowed through with a flag? Those answers belong before the first KPI review, not during it.&lt;/p&gt;
&lt;p&gt;Once that contract exists, I can turn it into boring controls: a freshness check on landing time, a uniqueness check on &lt;code&gt;shipment_event_id&lt;/code&gt;, null checks on &lt;code&gt;warehouse_id&lt;/code&gt; and &lt;code&gt;sku&lt;/code&gt;, accepted values on &lt;code&gt;event_type&lt;/code&gt;, and a publish rule that keeps quarantined rows out of executive reporting.&lt;/p&gt;
&lt;p&gt;I want that contract next to the ingestion checks or source tests, not in a wiki that drifts out of date.&lt;/p&gt;
&lt;p&gt;Without that contract, the downstream model is not really tested against the failure modes that matter. It is turning upstream ambiguity into downstream logic, which is usually when I need an explicit grain note just to make the row meaning visible again.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when: the source is exploratory and the schema changes every week → Mitigation: start with row meaning, landing cadence, and owner, then label the table non-authoritative until the field-level rules stabilize.&lt;/li&gt;
&lt;li&gt;Breaks when: the upstream team cannot guarantee uniqueness yet → Mitigation: land the raw table separately, quarantine duplicates, and keep business-facing models off it until the key behavior stabilizes.&lt;/li&gt;
&lt;li&gt;Breaks when: late corrections are normal for the business → Mitigation: publish a restatement window and a freeze rule so downstream users know which dates can still move and when the numbers become final.&lt;/li&gt;
&lt;li&gt;Breaks when: the contract document drifts away from actual source behavior → Mitigation: keep the contract beside ingestion checks or source tests and review it when the source logic changes.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Pick one critical source table and write the six contract lines that stop downstream guessing: row meaning, key rule, landing cadence, valid-record rules, correction window, and owner.&lt;/p&gt;
&lt;p&gt;If a source table keeps turning into downstream detective work, use those six lines to name the one missing boundary before another model inherits the guess.&lt;/p&gt;</content:encoded><category>analytics-engineering</category><category>data-contracts</category><category>data-engineering</category></item><item><title>How I choose an analytics stack I can debug</title><link>https://berhanturkkaynagi.com/blog/posts/my-stack/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/my-stack/</guid><description>I choose an analytics stack I can debug: trace a late run or wrong number back to the landed extract, the logic diff, the metric definition, and the owner.</description><pubDate>Thu, 05 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I choose an analytics stack I can debug under pressure, not the biggest stack on the market.&lt;/p&gt;
&lt;p&gt;When a number moves, I want to trace it to the last landed extract, the logic change, the metric definition, and the owner of the next check within minutes.&lt;/p&gt;
&lt;p&gt;My default shape is simple: land data with checks, transform in version control, test decision-changing models, publish metric definitions with an owner, and keep a small operating view for late or failed runs.&lt;/p&gt;
&lt;p&gt;Inventory, orders, and finance have different grains and failure modes, but I want the same handholds in every incident. That operating model matters more than the vendor names.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/diagrams/data-platform-flow.svg&quot; alt=&quot;A simple view of my default analytics engineering flow, from ingestion through transformation to decision-ready reporting.&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;a-concrete-example&quot;&gt;A concrete example&lt;/h2&gt;
&lt;p&gt;If an in-stock dashboard drops across a regional warehouse network on Monday morning, I want three answers before I touch the dashboard:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Did the latest inventory snapshot land on time?&lt;/li&gt;
&lt;li&gt;Did the transform keep the expected SKU-location row counts?&lt;/li&gt;
&lt;li&gt;Did the in-stock logic or location mapping change, or did the inventory position really change?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The stack earns its place when those answers are visible under pressure. It fails when custom glue, silent logic changes, or ownership gaps only show up once the room is already reacting.&lt;/p&gt;
&lt;p&gt;If the snapshot is late, I stop at ingestion. If it landed and row counts still look normal, I move to logic and mapping before I argue with the dashboard.&lt;/p&gt;
&lt;p&gt;I wrote up that first debugging pass in &lt;a href=&quot;/blog/posts/dashboard-number-changes/&quot;&gt;When a dashboard number changes, I check these four things first&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;what-i-optimize-for&quot;&gt;What I optimize for&lt;/h2&gt;
&lt;p&gt;I optimize for clear boundaries and fast evidence. Each layer should tell me where to look next when a KPI changes or a run misses SLA.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Clear boundaries between ingestion, transformation, semantic logic, and reporting&lt;/li&gt;
&lt;li&gt;Versioned SQL, model tests, and diffable logic on decision-changing models&lt;/li&gt;
&lt;li&gt;Checks that separate freshness, row volume, logic, and ownership problems&lt;/li&gt;
&lt;li&gt;Enough observability to show the latest publish, run duration, and owner&lt;/li&gt;
&lt;li&gt;Tooling a team can run without routing every small change through a specialist&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;how-i-evaluate-each-layer&quot;&gt;How I evaluate each layer&lt;/h2&gt;
&lt;p&gt;When I compare tools, I ask what evidence each layer gives me when a number moves or a run misses SLA.&lt;/p&gt;
&lt;p&gt;That filter matters more than feature count.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Ingestion:&lt;/strong&gt; I want the last landed batch, a freshness signal, and the &lt;a href=&quot;/blog/posts/minimum-data-contract-source-table/&quot;&gt;source contract&lt;/a&gt; or quarantine rule for bad records.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Transformation:&lt;/strong&gt; I want versioned SQL, tests on critical models, and a fast way to diff logic changes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Orchestration and operating view:&lt;/strong&gt; I want the latest successful publish, run duration versus normal, and an owner for the next check.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BI or semantic layer:&lt;/strong&gt; I want explicit metric definitions, visible filter logic, and fewer places for one KPI to fork into three meanings.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If a product makes automation easier but hides those artifacts, I usually skip it. I want the shortest path from symptom to cause and a clear owner at each boundary.&lt;/p&gt;
&lt;p&gt;When transformation is visible but promotion still feels opaque, I want a boring deployment record I can read quickly: state, selector, and target.&lt;/p&gt;
&lt;p&gt;Before I add workflow complexity, I ask whether a late run is already explainable from &lt;a href=&quot;/blog/posts/pipeline-observability-before-more-orchestration/&quot;&gt;a small operating view for late runs&lt;/a&gt;: source freshness, latest successful publish, run duration, output shape, and a clear owner for the first check.&lt;/p&gt;
&lt;p&gt;When dashboards still disagree on a KPI, the problem is usually definition drift, not infrastructure, and the handoff is to the owner of the metric definition rather than the platform team.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when ingestion arrives without stable keys, lateness rules, or contract expectations. Mitigation: add source-facing checks, quarantine bad records, and delay downstream metrics instead of patching reports by hand.&lt;/li&gt;
&lt;li&gt;Breaks when one platform claims to own ingestion, transformation, semantic logic, and alerting with no clear handoffs. Mitigation: keep boundaries explicit so failures stay visible and each layer has an owner.&lt;/li&gt;
&lt;li&gt;Breaks when a platform decision is used to hide missing metric ownership or unclear model grain. Mitigation: fix the definition and modeling gaps first, because no tool choice will remove that disagreement.&lt;/li&gt;
&lt;li&gt;Breaks when the business really needs heavy real-time behavior or event-driven fan-out. Mitigation: keep the analytical path simple and add streaming components only where latency changes the decision.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Take one important metric and map it from source to dashboard, including the checks you expect at each boundary.&lt;/p&gt;
&lt;p&gt;The useful test is whether a late run can still be traced from symptom to artifact to owner without the stack hiding responsibility.&lt;/p&gt;</content:encoded><category>data-platforms</category><category>analytics-engineering</category><category>data-pipelines</category></item><item><title>What I write about: reliable analytics systems that stay explainable</title><link>https://berhanturkkaynagi.com/blog/posts/welcome/</link><guid isPermaLink="true">https://berhanturkkaynagi.com/blog/posts/welcome/</guid><description>I write about pipeline checks, metric definitions, dashboard trust, and delivery habits that keep analytics systems explainable when numbers move.</description><pubDate>Thu, 05 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I write about the checks, decisions, and handoffs I use when a KPI moves, a pipeline slips, or two dashboards disagree.&lt;/p&gt;
&lt;p&gt;If I cannot point to the query, release step, owner, or failed check behind a claim, I leave it out.&lt;/p&gt;
&lt;h2 id=&quot;what-i-cover&quot;&gt;What I cover&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Pipeline checks that catch lateness, row loss, join failures, and broken publishes before the dashboard takes the hit&lt;/li&gt;
&lt;li&gt;Modeling patterns that keep row meaning, metric definitions, and dashboard numbers legible as the warehouse and team grow&lt;/li&gt;
&lt;li&gt;Delivery habits like release checklists, incident notes, ownership boundaries, and debugging order that hold up under pressure&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;a-concrete-example&quot;&gt;A concrete example&lt;/h2&gt;
&lt;p&gt;Say Monday revenue drops in an executive dashboard and nobody expected it.&lt;/p&gt;
&lt;p&gt;I do not start with the chart. I start with the smallest checks that narrow the failure.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Confirm the source landed on time.&lt;/li&gt;
&lt;li&gt;Compare row counts and key joins in the model behind the chart.&lt;/li&gt;
&lt;li&gt;Check whether the metric logic or filter changed on purpose.&lt;/li&gt;
&lt;li&gt;Write down the current hypothesis before Slack fills with partial explanations.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That sequence is the first-pass order I use in &lt;a href=&quot;/blog/posts/dashboard-number-changes/&quot;&gt;When a dashboard number changes, I check these four things first&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;That order tells me whether I am dealing with lateness, shape, joins, or definition drift before the room turns the issue into a tooling argument.&lt;/p&gt;
&lt;p&gt;When those checks pass and two dashboards still disagree, I treat it as a definition problem, not a chart problem. I move to metric definition work before I touch the chart again.&lt;/p&gt;
&lt;p&gt;When the incident is active and other people need updates, I keep a short incident note beside the debugging work.&lt;/p&gt;
&lt;p&gt;I want each post to leave behind something I would run again: a check, a note template, a definition card, or a release habit that keeps the next incident explainable.&lt;/p&gt;
&lt;h2 id=&quot;tradeoffs&quot;&gt;Tradeoffs&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Breaks when I start writing about tools before I can name the check, contract, or decision rule that matters. Mitigation: anchor the post to the operating artifact first and mention tooling only when it changes the workflow.&lt;/li&gt;
&lt;li&gt;Breaks when the useful answer depends on team size, latency expectations, or ownership structure. Mitigation: say that boundary directly and explain what I would change under those constraints.&lt;/li&gt;
&lt;li&gt;Breaks when the real example is too client-specific to publish cleanly. Mitigation: generalize names and numbers, but keep the failure mode, decision, and mitigation honest.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;close&quot;&gt;Close&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Start with &lt;a href=&quot;/blog/posts/my-stack/&quot;&gt;my default stack and workflow&lt;/a&gt; if you want the operating model behind these notes.&lt;/p&gt;
&lt;p&gt;If the live pressure point is a late run, a KPI that moved, or a dashboard disagreement, use that situation to choose the first field note instead of reading the corpus front to back.&lt;/p&gt;</content:encoded><category>data-engineering</category><category>analytics-engineering</category></item></channel></rss>