Note: This blog post is a response to the T-SQL Tuesday #198 invitation from Meagan on her personal blog: How Do You Detect Data Changes? You can learn more about T-SQL Tuesday at tsqltuesday.com.
ETL Change Detection: Stop Full Refreshing Everything
Most ETL change detection pain is self-inflicted. Teams keep full-refreshing tables because it feels safe; however, that habit breaks at scale. As data grows, run windows stretch, compute costs climb, and refresh cadence slows.
So this is the real problem: ETL change detection. If your change signal is weak, your dashboards drift. If your change signal is solid, your platform stays trustworthy.

Starts in Silver
Use Medallion as an operating model:
- Bronze captures raw change signals.
- Silver turns those signals into business-ready entities.
- Gold serves reporting and product use cases.
Yet teams still model Silver as a source copy. That is where trust starts to erode. Instead, Silver should be a business contract: cust_nm becomes customer_name, and acct_st becomes account_status.
At this layer, one question matters: how do you detect what changed since the last successful run? In practice, the signal usually comes from timestamps, row hashes, stream events, or database transaction logs (CDC). Strong ETL change detection depends on this contract being explicit.
ETL Change Detection Decision Table
| Strategy | Best For | Main Risk |
|---|---|---|
| Timestamp | Simple, low-risk datasets | Missed deletes and dirty timestamps |
| Hash Diff | Detecting content changes | No delete signal by itself |
| Change Event Stream | Near-real-time SQL change routing into Event Hubs | Emerging pattern with source and connector limitations; not a drop-in replacement for batch merge logic |
| Source CDC | Highest-fidelity system-of-record tracking | More operational complexity and source-side overhead |
Note: Fabric Mirroring CDF is useful replication plumbing into OneLake, but in this post it is not treated as the incremental load pattern itself.
What to Use in Practice
First, use source CDC when you need durable incremental merge behavior in Silver with replayability and explicit delete semantics. Next, use Change Event Stream when you need near-real-time SQL change events routed into Event Hubs for downstream stream processing and integrations.
Meanwhile, use timestamp and hash as support patterns, not your only technique. Hashing is excellent for change comparison; however, by itself, it is not a delete strategy.
Limitations worth noting: Change Event Stream flows into Event Hubs and other stream consumers, but source and connector support is still evolving. Source CDC gives stronger replay fidelity, but it adds operational overhead. If you combine mirrored tables and stream events, you still need explicit handling for delete semantics, event ordering, and watermark alignment.
Simple MERGE Pattern for ETL Change Detection (Pseudocode)
The syntax varies by platform; however, the contract should stay the same:
latest_changes AS (
SELECT latest_event_per_business_key
FROM bronze_changes
WHERE event_time >= watermark_minus_lookback
)
MERGE INTO silver_table AS tgt
USING latest_changes AS src
ON tgt.business_key = src.business_key
WHEN MATCHED AND src.change_type = 'DELETE' THEN
UPDATE SET
tgt.is_deleted = true,
tgt.deleted_at_utc = CURRENT_TIMESTAMP,
tgt.last_modified_utc = src.event_time
WHEN MATCHED
AND src.change_type IN ('INSERT', 'UPDATE')
AND (
tgt.hash_diff <> src.hash_diff
OR tgt.last_modified_utc < src.event_time
) THEN
UPDATE SET
tgt.business_columns = src.business_columns,
tgt.hash_diff = src.hash_diff,
tgt.last_modified_utc = src.event_time,
tgt.is_deleted = false,
tgt.deleted_at_utc = NULL
WHEN NOT MATCHED AND src.change_type <> 'DELETE' THEN
INSERT (...business columns..., hash_diff, last_modified_utc, is_deleted)
VALUES (...src values..., src.hash_diff, src.event_time, false);
-- advance watermark only after successful commit
Before merging: dedupe to one latest event per business key. During merging: apply deletes intentionally and update only the changes you want to capture, especially when your source emits empty updates or other systematic noise. After merging: move the watermark only if the transaction succeeds.
War Story: Why 24-Hour Windows Fail
At first, one team loaded orders every hour with WHERE updated_at >= now() - 24h. Everything looked fine. Then finance found margin drift in historical periods.
The root cause was late-arriving updates: old orders were adjusted months later without modifying the updated_at value, so those changes fell outside the 24-hour filter. As a result, Silver never got corrected.
So, they changed three things:
- A rolling look-back window
- Stateful watermarking by source partition and log position
- Idempotent merges with deterministic ordering
If data arrives late, a narrow time filter is not a state model. Sometimes ad-hoc processes won’t update the updated_at column you rely on or flat files are delivered late. You need to account for these common scenarios to ensure your data stays accurate and trustworthy.
In practice, ETL change detection succeeds when late updates, deletes, and watermark state are handled together instead of as separate concerns.
Business Impact
This is not just an engineering optimization. Weak ETL change detection leads to stale dashboards, finance reconciliation failures, longer refresh windows, and reduced confidence in reporting.
Therefore, the goal is not only faster pipelines. The goal is repeatable trust.
Bottom Line
For most enterprise platforms, the practical order is:
- Source CDC when source-level fidelity is required
- Change Event Stream for near-real-time event distribution via Event Hubs
- Hash diff for robust change comparison
- Timestamp as fallback, not foundation
Before your next sprint, pressure-test your platform against time itself: when did the source change, when was it ingested, and when did the business expect to see it?
Then trace one record end to end across your load process: insert, update, late correction, and hard delete. If that timeline is not explicit and reproducible, your data product is running on luck.
The strongest teams do not just load data faster. They can prove data truth across timelines.
If your team is wrestling with this, you are not alone. Contact us at ProcureSQL and we can help bring clarity to your load architecture, timeline integrity, and delete strategy.