AzureMX — Option B

Vertical stepper that expands on click. Only one big section at a time.

Click a step to expand. Great for narrative walkthroughs without wasting space.
1. Sources SQL Server • Legacy DBs • Flat files
  • Multiple patterns
    Tables, CDC/watermarks, file drops, snapshots.
  • Why metadata
    Avoid per-source/per-table custom pipelines.
Sources:
- SQL Server (tables)
- Legacy RDBMS (on-prem)
- SFTP/Share drives (CSV/TXT/Parquet)
2. Metadata Repository Load type • keys • watermarks • target mapping
  • Controls behavior
    Source details, columns, PKs, incremental columns, target paths.
  • Fast onboarding
    New datasets added by metadata, not new code.
metadata.table_config (concept)
source_system = sqlserver_erp
schema       = sales
table        = orders
load_type    = incremental
pk           = order_id
watermark    = last_updated_ts
target_path  = /curated/sales/orders/
3. Azure Data Factory (Control Plane) ForEach metadata → parameterized execution
  • Dynamic execution
    Reads metadata and runs the same pattern for hundreds of tables.
  • Operational reliability
    Dependencies, retries, and centralized run tracking.
ADF pipeline:
1) lookup metadata rows
2) foreach table_config row
3) set params (paths, load_type, watermark)
4) trigger Databricks notebook
5) update audit status
4. ADLS Gen2 (Raw Zone) Immutable landing + replayable paths
  • Standard folder structure
    source_system/schema/table/load_date
  • Backdated processing
    Override watermarks to re-run historical windows.
/raw/sqlserver_erp/sales/orders/load_date=2026-01-10/...
/raw/files_sftp/hr/employees/load_date=2026-01-10/employees.csv
5. Databricks → Curated / Synapse Full / Incremental MERGE / Snapshot append + auditing
  • Load-type logic
    Overwrite for full, MERGE for incremental, append for snapshots.
  • Audit + validation
    Row counts, duration, success/failure + errors.
Incremental (concept):
MERGE INTO curated.orders t
USING staged.orders s
ON t.order_id = s.order_id
WHEN MATCHED THEN UPDATE ...
WHEN NOT MATCHED THEN INSERT ...

Audit:
src_count, tgt_count, start_ts, end_ts, status