top of page

Data Warehouse Migration Testing: The Complete Checklist for Data Engineers

  • Writer: Vexdata
    Vexdata
  • Apr 13
  • 10 min read
data warehouse migration testing checklist

If you've been involved in a data warehouse migration, you know the pattern. The project plan looks thorough. The technical team is experienced. The timeline seems reasonable. And then something breaks at 2 AM, two days before go-live, because a VARCHAR(255) field in the source system became VARCHAR(128) in the target — silently, without error, truncating eight months of customer notes.


Most data warehouse migration guides focus on the project: timelines, stakeholder communication, cutover planning, rollback strategy. Those things matter. But they're not what breaks migrations.


What breaks migrations is data. Specifically, data that was never tested systematically at every stage of the process.


This checklist is for data engineers and QA leads who need a rigorous, phase-by-phase testing framework — not a project management template. It covers every test type that matters, when to run it, who owns it, and what a pass/fail looks like. Use it as a working document alongside your migration.


"83% of data migration projects fail, exceed budget, or disrupt business operations." The reason is almost always data quality — not tooling, not timelines, not team capability.


Why Data Warehouse Migration Testing Is Different from Regular QA


Standard application QA tests functional behaviour — does the button work, does the form submit, does the API return the right status code. Data warehouse migration testing is a different discipline entirely.


You're not testing whether the system functions. You're testing whether every row of data that existed in the source warehouse arrived in the target warehouse with the correct values, in the correct structure, with the correct relationships intact — at a scale that makes manual verification impossible.


The testing surface is enormous. A mid-sized enterprise data warehouse typically contains:

  • Hundreds to thousands of tables across multiple schemas

  • Billions of individual field values to verify

  • Complex transformation logic applied during extraction or loading

  • Referential relationships that span dozens of tables

  • Business rules that exist only in the institutional knowledge of the team

  • Historical data accumulated over years, often with legacy quality issues baked in


Any one of these can fail silently. No error is thrown. The migration completes. The data arrives — but wrong.


That's why the testing framework for a data warehouse migration needs to be systematic, automated, and phase-specific. The checklist below gives you exactly that.


  Phase 1    Pre-Migration Testing  —  Before a Single Row Moves


Pre-migration testing is the most valuable and most skipped phase of data warehouse validation. Every issue you find here costs a fraction of what it costs to find post-migration. The goal is to understand exactly what you're working with before the migration begins.


1.1  Source Data Profiling

Data profiling means analysing the source warehouse to characterise what the data actually looks like — not what the data dictionary says it should look like. These two things are rarely identical.

Run profiling across all tables marked for migration and document the following for each column:


Test Check

Owner

Priority

☐  Row count per table — establish baseline for parity checks

Data Engineer

Critical

☐  Null rate per column — identify mandatory fields with missing data

Data Engineer

Critical

☐  Distinct value counts — flag unexpected cardinality

Data Engineer

High

☐  Min / max / avg for numeric columns — identify out-of-range values

Data Engineer

High

☐  Date range validation — confirm timestamps within expected bounds

Data Engineer

High

☐  Duplicate detection — row-level and key-level deduplication

Data Engineer

Critical

☐  String length distribution — catch values exceeding target column width

Data Engineer

Critical

☐  Encoding check — identify non-UTF-8 characters that break target writes

Data Engineer

High

☐  Special character audit — flag values with commas, quotes, line breaks

Data Engineer

Medium


Output: A source data profile report documenting quality issues by table and column — this becomes your pre-migration baseline. Every post-migration check will compare against it.


1.2  Schema and Metadata Validation

Before any data moves, validate that the source-to-target mapping document accurately reflects both systems. Schema mismatches are the #1 cause of silent data corruption during migration.


Test Check

Owner

Priority

☐  Confirm all source tables exist in target schema with correct names

Data Engineer

Critical

☐  Validate data type compatibility for every mapped column

Data Engineer

Critical

☐  Check column length constraints — source VARCHAR(500) vs target VARCHAR(255)

Data Engineer

Critical

☐  Verify nullable vs NOT NULL constraints are correctly translated

Data Engineer

High

☐  Confirm primary key definitions match across source and target

Data Engineer

Critical

☐  Validate foreign key relationships are defined in target schema

Data Engineer

High

☐  Check default value handling for columns with defaults in source

Data Engineer

Medium

☐  Confirm index definitions are created in target for performance

Data Engineer

Medium

☐  Validate partition strategy matches if target uses partitioned tables

Data Engineer

High


1.3  Business Rule Inventory


This is the step most technical teams skip — and the one that causes the most post-go-live fires. Business rules are the logic that exists in the data, not in the schema. They aren't enforced by database constraints. They live in ETL code, stored procedures, and the institutional knowledge of data consumers.


Before migration, interview the owners of each major subject area and document:

  • Calculated fields and the formulas behind them

  • Aggregation logic used in reports that draw from this warehouse

  • Any data masking or anonymisation applied to sensitive columns

  • Lookup tables and the values consumers rely on being present

  • Filters or exclusions applied during ETL that won't be obvious in the target


These rules become your transformation validation test cases in Phase 2.


  Phase 2    Migration Testing  —  During Each Load Wave


Most data warehouse migrations run in waves — fact tables, dimension tables, and historical archives loaded in sequence. A testing checkpoint should follow every wave. Not at the end of the migration. After every wave.


The cost of catching a mapping error after wave one is a configuration fix. The cost of catching the same error after wave five — when subsequent tables have been built on top of incorrect data — is a full re-migration of multiple tables.


2.1  Row Count Parity

The most basic check — but it must be executed first, because everything else depends on knowing all records arrived.


Test Check

Owner

Priority

☐  Total row count matches source for each table in the wave

Data Engineer

Critical

☐  Row count by key partition (e.g. by month, by region, by entity)

Data Engineer

Critical

☐  Confirm no records rejected by target constraints — check error logs

Data Engineer

Critical

☐  Verify incremental load counts if running delta loads

Data Engineer

High

☐  Confirm deleted/archived records handled per defined strategy

Data Engineer

High


2.2  Field-Level Value Validation

Row count parity tells you all records arrived. Field-level validation tells you whether they arrived correctly. This is where most migrations fail quietly.


Test Check

Owner

Priority

☐  Spot-check critical columns for 100 randomly sampled records

QA Lead

Critical

☐  Full comparison of all values in high-risk columns (IDs, amounts, dates)

Data Engineer

Critical

☐  Validate numeric precision — no rounding errors in financial fields

Data Engineer

Critical

☐  Date/timestamp format consistency — no timezone conversion errors

Data Engineer

Critical

☐  String encoding preserved — no corruption of special characters

Data Engineer

High

☐  NULL values mapped correctly — not substituted with empty string or 0

Data Engineer

High

☐  Boolean field handling consistent — TRUE/FALSE vs 1/0 vs Y/N

Data Engineer

Medium


2.3  Transformation Validation

Every calculated field, derived column, aggregation, and business rule applied during migration must be tested against expected output. This is where the business rule inventory from Phase 1 becomes essential.


Test Check

Owner

Priority

☐  Calculated fields produce expected values for sample set of records

Data Engineer

Critical

☐  Aggregations match source — totals, counts, averages by key dimensions

Data Engineer

Critical

☐  JOIN logic produces expected output — no fan-out or record loss

Data Engineer

Critical

☐  Lookup table values resolve correctly in transformed data

Data Engineer

High

☐  Filters and exclusion rules applied consistently across full dataset

Data Engineer

High

☐  Data masking / anonymisation applied correctly to PII fields

QA Lead

Critical

☐  Derived date fields (e.g. fiscal year, age bands) calculated correctly

Data Engineer

High


2.4  Referential Integrity

Once dimension tables are loaded, fact tables must be validated against them. Every foreign key in a fact table must resolve to a primary key in its parent dimension — no orphan records.


Test Check

Owner

Priority

☐  All foreign key values in fact tables exist in corresponding dimension tables

Data Engineer

Critical

☐  No orphan records — child rows with no parent

Data Engineer

Critical

☐  Lookup keys (e.g. product_id, customer_id) resolve in reporting layer

Data Engineer

Critical

☐  Hierarchical relationships preserved (parent → child → grandchild)

Data Engineer

High

☐  Bridge tables correctly represent many-to-many relationships

Data Engineer

High


  Phase 3    Post-Migration Validation  —  Before and After Go-Live


Post-migration validation happens in two moments: immediately before go-live (cutover validation), and in the weeks following (production monitoring). Both are essential, and both are frequently under-resourced.


3.1  Cutover Validation

Cutover validation is the final gate. It runs against the most recent data load, under time pressure, and the output is the go/no-go decision. For this reason it must be fully automated — a validation suite that can be executed and produce a report in under an hour.


Test Check

Owner

Priority

☐  Final row counts match across all migrated tables

Data Engineer

Critical

☐  Reconciliation report produced — aggregates match source by key dimensions

Data Engineer

Critical

☐  All critical business rules verified with automated test suite

QA Lead

Critical

☐  Referential integrity confirmed across full fact/dimension model

Data Engineer

Critical

☐  BI tool reports tested — key dashboards produce expected metrics

QA Lead

Critical

☐  Application connectivity confirmed — all consuming systems connected

Data Engineer

High

☐  Performance benchmarks met — query response times within SLA

Data Engineer

High

☐  Access controls verified — user permissions correctly implemented

QA Lead

High

☐  Audit trail available for all migration validation runs

QA Lead

High

☐  Rollback plan confirmed and tested — can revert within defined window

Data Engineer

Critical


3.2  Post-Go-Live Monitoring

The migration is live. Users are on the new system. This is not the time to relax validation — it's the time to double down on monitoring. The two weeks following a data warehouse go-live are when previously undetected issues surface at scale.


Test Check

Owner

Priority

☐  Daily row count monitoring — alert on unexpected volume changes

Data Engineer

Critical

☐  Pipeline job completion monitoring — no silent failures

Data Engineer

Critical

☐  Report reconciliation — weekly comparison of key metrics vs source

QA Lead

Critical

☐  Schema change alerting — flag any unplanned structural changes

Data Engineer

High

☐  Data freshness checks — confirm incremental loads arriving on schedule

Data Engineer

High

☐  User-reported discrepancy triage — formal process for flagging data issues

QA Lead

High

☐  Performance trend monitoring — query execution time baselines

Data Engineer

Medium


Pro tip: Set up automated daily reconciliation for the first 30 days post-migration. The most common failure mode is a data issue that passes cutover validation but only surfaces when production volumes and usage patterns stress the system.


The 5 Most Common Data Warehouse Migration Testing Failures


Based on real migration post-mortems, these are the failure modes that recur most frequently — and that a rigorous testing checklist would have caught:


1.  Trusting the Source Schema Documentation

Source-to-target mapping documents are built from schema documentation. Schema documentation describes how the warehouse was designed — not how it evolved over years of use. Columns that appear as NOT NULL in the documentation contain nulls in practice. Fields documented as VARCHAR(100) contain values 247 characters long. Always profile the actual data, not the documentation.


2.  Testing at the End Instead of After Every Wave

Teams that test only at the end of the migration face one of two outcomes: they catch a systematic error that requires re-migrating multiple tables, or they don't catch it and it goes to production. Testing after every load wave catches errors when they're cheapest to fix.


3.  Forgetting the BI Layer

A migration that passes data-level validation can still fail at the reporting layer. A dashboard that was correct in the old warehouse can produce wrong numbers in the new one — because a calculation that was performed in the ETL layer is no longer being applied, or because a dimension key changed format. BI report validation must be part of the testing checklist.


4.  No Automated Reconciliation

Manual reconciliation — a data engineer writing SQL queries against source and target and comparing results in a spreadsheet — does not scale. At 1,000+ tables and billions of rows, manual reconciliation either misses systematic errors (because the sample is too small) or takes so long that the migration falls behind schedule. Automated reconciliation is not optional at enterprise scale.


5.  Stopping Validation at Go-Live

Production use surfaces issues that testing environments never will. Real user queries, unexpected edge cases in production data, and incremental load failures that only appear under real volume are all post-go-live phenomena. Post-migration monitoring is not operational housekeeping — it's the final phase of the testing programme.


Automating Your Migration Testing Checklist


The checklist above contains more than 60 individual test checks. Executing them manually across hundreds of tables is not feasible — and manual execution introduces the human errors you're trying to catch.


Automated data warehouse migration testing platforms can execute the full checklist — row count parity, field-level validation, transformation checks, referential integrity, reconciliation — against the entire dataset in a fraction of the time that manual testing requires.


More importantly, automated tests can be re-run after every configuration change. When a mapping error is found and corrected, the full validation suite confirms whether the fix resolved the problem or introduced new ones — something manual testing cannot do reliably within migration timelines.


Vexdata's Data Warehouse Migration Testing platform automates every phase of this checklist — from pre-migration source profiling to post-go-live monitoring — and produces stakeholder-ready validation reports after every run. It covers 100% of records, not samples, and runs in parallel across tables to meet the time constraints of cutover windows.


Migration Sign-Off Requirements by Phase


Each phase should produce a formal sign-off before the next begins. Here's the minimum required for each gate:


Phase Gate

Required Evidence

Sign-Off Owner

Pre-Migration

Source profile report; schema comparison document; business rules inventory; open issues log

Lead Data Engineer + Project Sponsor

Each Wave

Row count parity report; field validation results; referential integrity check; open defects

Data Engineer + QA Lead

Cutover / Go-Live

Final reconciliation report; BI validation results; performance benchmarks; access control confirmation; signed go/no-go document

Data Engineering Lead + QA Lead + Business Owner

Post-Go-Live (30d)

Daily monitoring reports; weekly reconciliation summaries; defect resolution log; user acceptance confirmation

Data Engineering Lead + Business Owner


The Bottom Line


A data warehouse migration is not complete when the data arrives in the target system. It's complete when you can prove — with documented test results — that every row arrived correctly, every transformation was applied accurately, every relationship is intact, and every downstream consumer is receiving the data they expect.


The checklist in this article gives you the testing framework to make that proof. Work through it phase by phase. Execute it after every load wave, not just at the end. Automate what can be automated — which is most of it.


If you're planning a data warehouse migration and want to see how automated validation works in practice, Vexdata's Data Warehouse Migration Testing platform runs the full checklist — source profiling, transformation validation, reconciliation, and post-go-live monitoring — across your entire dataset, not a sample, and produces audit-ready reports your stakeholders can actually use.


→  Learn more: vexdata.io/data-warehouse-migration     |     →  Book a demo: vexdata.io/contact

 
 
 

Comments


bottom of page