Data Warehouse Migration Testing: The Complete Checklist for Data Engineers
- Vexdata

- Apr 13
- 10 min read

If you've been involved in a data warehouse migration, you know the pattern. The project plan looks thorough. The technical team is experienced. The timeline seems reasonable. And then something breaks at 2 AM, two days before go-live, because a VARCHAR(255) field in the source system became VARCHAR(128) in the target — silently, without error, truncating eight months of customer notes.
Most data warehouse migration guides focus on the project: timelines, stakeholder communication, cutover planning, rollback strategy. Those things matter. But they're not what breaks migrations.
What breaks migrations is data. Specifically, data that was never tested systematically at every stage of the process.
This checklist is for data engineers and QA leads who need a rigorous, phase-by-phase testing framework — not a project management template. It covers every test type that matters, when to run it, who owns it, and what a pass/fail looks like. Use it as a working document alongside your migration.
"83% of data migration projects fail, exceed budget, or disrupt business operations." The reason is almost always data quality — not tooling, not timelines, not team capability.
Why Data Warehouse Migration Testing Is Different from Regular QA
Standard application QA tests functional behaviour — does the button work, does the form submit, does the API return the right status code. Data warehouse migration testing is a different discipline entirely.
You're not testing whether the system functions. You're testing whether every row of data that existed in the source warehouse arrived in the target warehouse with the correct values, in the correct structure, with the correct relationships intact — at a scale that makes manual verification impossible.
The testing surface is enormous. A mid-sized enterprise data warehouse typically contains:
Hundreds to thousands of tables across multiple schemas
Billions of individual field values to verify
Complex transformation logic applied during extraction or loading
Referential relationships that span dozens of tables
Business rules that exist only in the institutional knowledge of the team
Historical data accumulated over years, often with legacy quality issues baked in
Any one of these can fail silently. No error is thrown. The migration completes. The data arrives — but wrong.
That's why the testing framework for a data warehouse migration needs to be systematic, automated, and phase-specific. The checklist below gives you exactly that.
Phase 1 Pre-Migration Testing — Before a Single Row Moves
Pre-migration testing is the most valuable and most skipped phase of data warehouse validation. Every issue you find here costs a fraction of what it costs to find post-migration. The goal is to understand exactly what you're working with before the migration begins.
1.1 Source Data Profiling
Data profiling means analysing the source warehouse to characterise what the data actually looks like — not what the data dictionary says it should look like. These two things are rarely identical.
Run profiling across all tables marked for migration and document the following for each column:
Test Check | Owner | Priority |
☐ Row count per table — establish baseline for parity checks | Data Engineer | Critical |
☐ Null rate per column — identify mandatory fields with missing data | Data Engineer | Critical |
☐ Distinct value counts — flag unexpected cardinality | Data Engineer | High |
☐ Min / max / avg for numeric columns — identify out-of-range values | Data Engineer | High |
☐ Date range validation — confirm timestamps within expected bounds | Data Engineer | High |
☐ Duplicate detection — row-level and key-level deduplication | Data Engineer | Critical |
☐ String length distribution — catch values exceeding target column width | Data Engineer | Critical |
☐ Encoding check — identify non-UTF-8 characters that break target writes | Data Engineer | High |
☐ Special character audit — flag values with commas, quotes, line breaks | Data Engineer | Medium |
Output: A source data profile report documenting quality issues by table and column — this becomes your pre-migration baseline. Every post-migration check will compare against it.
1.2 Schema and Metadata Validation
Before any data moves, validate that the source-to-target mapping document accurately reflects both systems. Schema mismatches are the #1 cause of silent data corruption during migration.
Test Check | Owner | Priority |
☐ Confirm all source tables exist in target schema with correct names | Data Engineer | Critical |
☐ Validate data type compatibility for every mapped column | Data Engineer | Critical |
☐ Check column length constraints — source VARCHAR(500) vs target VARCHAR(255) | Data Engineer | Critical |
☐ Verify nullable vs NOT NULL constraints are correctly translated | Data Engineer | High |
☐ Confirm primary key definitions match across source and target | Data Engineer | Critical |
☐ Validate foreign key relationships are defined in target schema | Data Engineer | High |
☐ Check default value handling for columns with defaults in source | Data Engineer | Medium |
☐ Confirm index definitions are created in target for performance | Data Engineer | Medium |
☐ Validate partition strategy matches if target uses partitioned tables | Data Engineer | High |
1.3 Business Rule Inventory
This is the step most technical teams skip — and the one that causes the most post-go-live fires. Business rules are the logic that exists in the data, not in the schema. They aren't enforced by database constraints. They live in ETL code, stored procedures, and the institutional knowledge of data consumers.
Before migration, interview the owners of each major subject area and document:
Calculated fields and the formulas behind them
Aggregation logic used in reports that draw from this warehouse
Any data masking or anonymisation applied to sensitive columns
Lookup tables and the values consumers rely on being present
Filters or exclusions applied during ETL that won't be obvious in the target
These rules become your transformation validation test cases in Phase 2.
Phase 2 Migration Testing — During Each Load Wave
Most data warehouse migrations run in waves — fact tables, dimension tables, and historical archives loaded in sequence. A testing checkpoint should follow every wave. Not at the end of the migration. After every wave.
The cost of catching a mapping error after wave one is a configuration fix. The cost of catching the same error after wave five — when subsequent tables have been built on top of incorrect data — is a full re-migration of multiple tables.
2.1 Row Count Parity
The most basic check — but it must be executed first, because everything else depends on knowing all records arrived.
Test Check | Owner | Priority |
☐ Total row count matches source for each table in the wave | Data Engineer | Critical |
☐ Row count by key partition (e.g. by month, by region, by entity) | Data Engineer | Critical |
☐ Confirm no records rejected by target constraints — check error logs | Data Engineer | Critical |
☐ Verify incremental load counts if running delta loads | Data Engineer | High |
☐ Confirm deleted/archived records handled per defined strategy | Data Engineer | High |
2.2 Field-Level Value Validation
Row count parity tells you all records arrived. Field-level validation tells you whether they arrived correctly. This is where most migrations fail quietly.
Test Check | Owner | Priority |
☐ Spot-check critical columns for 100 randomly sampled records | QA Lead | Critical |
☐ Full comparison of all values in high-risk columns (IDs, amounts, dates) | Data Engineer | Critical |
☐ Validate numeric precision — no rounding errors in financial fields | Data Engineer | Critical |
☐ Date/timestamp format consistency — no timezone conversion errors | Data Engineer | Critical |
☐ String encoding preserved — no corruption of special characters | Data Engineer | High |
☐ NULL values mapped correctly — not substituted with empty string or 0 | Data Engineer | High |
☐ Boolean field handling consistent — TRUE/FALSE vs 1/0 vs Y/N | Data Engineer | Medium |
2.3 Transformation Validation
Every calculated field, derived column, aggregation, and business rule applied during migration must be tested against expected output. This is where the business rule inventory from Phase 1 becomes essential.
Test Check | Owner | Priority |
☐ Calculated fields produce expected values for sample set of records | Data Engineer | Critical |
☐ Aggregations match source — totals, counts, averages by key dimensions | Data Engineer | Critical |
☐ JOIN logic produces expected output — no fan-out or record loss | Data Engineer | Critical |
☐ Lookup table values resolve correctly in transformed data | Data Engineer | High |
☐ Filters and exclusion rules applied consistently across full dataset | Data Engineer | High |
☐ Data masking / anonymisation applied correctly to PII fields | QA Lead | Critical |
☐ Derived date fields (e.g. fiscal year, age bands) calculated correctly | Data Engineer | High |
2.4 Referential Integrity
Once dimension tables are loaded, fact tables must be validated against them. Every foreign key in a fact table must resolve to a primary key in its parent dimension — no orphan records.
Test Check | Owner | Priority |
☐ All foreign key values in fact tables exist in corresponding dimension tables | Data Engineer | Critical |
☐ No orphan records — child rows with no parent | Data Engineer | Critical |
☐ Lookup keys (e.g. product_id, customer_id) resolve in reporting layer | Data Engineer | Critical |
☐ Hierarchical relationships preserved (parent → child → grandchild) | Data Engineer | High |
☐ Bridge tables correctly represent many-to-many relationships | Data Engineer | High |
Phase 3 Post-Migration Validation — Before and After Go-Live
Post-migration validation happens in two moments: immediately before go-live (cutover validation), and in the weeks following (production monitoring). Both are essential, and both are frequently under-resourced.
3.1 Cutover Validation
Cutover validation is the final gate. It runs against the most recent data load, under time pressure, and the output is the go/no-go decision. For this reason it must be fully automated — a validation suite that can be executed and produce a report in under an hour.
Test Check | Owner | Priority |
☐ Final row counts match across all migrated tables | Data Engineer | Critical |
☐ Reconciliation report produced — aggregates match source by key dimensions | Data Engineer | Critical |
☐ All critical business rules verified with automated test suite | QA Lead | Critical |
☐ Referential integrity confirmed across full fact/dimension model | Data Engineer | Critical |
☐ BI tool reports tested — key dashboards produce expected metrics | QA Lead | Critical |
☐ Application connectivity confirmed — all consuming systems connected | Data Engineer | High |
☐ Performance benchmarks met — query response times within SLA | Data Engineer | High |
☐ Access controls verified — user permissions correctly implemented | QA Lead | High |
☐ Audit trail available for all migration validation runs | QA Lead | High |
☐ Rollback plan confirmed and tested — can revert within defined window | Data Engineer | Critical |
3.2 Post-Go-Live Monitoring
The migration is live. Users are on the new system. This is not the time to relax validation — it's the time to double down on monitoring. The two weeks following a data warehouse go-live are when previously undetected issues surface at scale.
Test Check | Owner | Priority |
☐ Daily row count monitoring — alert on unexpected volume changes | Data Engineer | Critical |
☐ Pipeline job completion monitoring — no silent failures | Data Engineer | Critical |
☐ Report reconciliation — weekly comparison of key metrics vs source | QA Lead | Critical |
☐ Schema change alerting — flag any unplanned structural changes | Data Engineer | High |
☐ Data freshness checks — confirm incremental loads arriving on schedule | Data Engineer | High |
☐ User-reported discrepancy triage — formal process for flagging data issues | QA Lead | High |
☐ Performance trend monitoring — query execution time baselines | Data Engineer | Medium |
Pro tip: Set up automated daily reconciliation for the first 30 days post-migration. The most common failure mode is a data issue that passes cutover validation but only surfaces when production volumes and usage patterns stress the system.
The 5 Most Common Data Warehouse Migration Testing Failures
Based on real migration post-mortems, these are the failure modes that recur most frequently — and that a rigorous testing checklist would have caught:
1. Trusting the Source Schema Documentation
Source-to-target mapping documents are built from schema documentation. Schema documentation describes how the warehouse was designed — not how it evolved over years of use. Columns that appear as NOT NULL in the documentation contain nulls in practice. Fields documented as VARCHAR(100) contain values 247 characters long. Always profile the actual data, not the documentation.
2. Testing at the End Instead of After Every Wave
Teams that test only at the end of the migration face one of two outcomes: they catch a systematic error that requires re-migrating multiple tables, or they don't catch it and it goes to production. Testing after every load wave catches errors when they're cheapest to fix.
3. Forgetting the BI Layer
A migration that passes data-level validation can still fail at the reporting layer. A dashboard that was correct in the old warehouse can produce wrong numbers in the new one — because a calculation that was performed in the ETL layer is no longer being applied, or because a dimension key changed format. BI report validation must be part of the testing checklist.
4. No Automated Reconciliation
Manual reconciliation — a data engineer writing SQL queries against source and target and comparing results in a spreadsheet — does not scale. At 1,000+ tables and billions of rows, manual reconciliation either misses systematic errors (because the sample is too small) or takes so long that the migration falls behind schedule. Automated reconciliation is not optional at enterprise scale.
5. Stopping Validation at Go-Live
Production use surfaces issues that testing environments never will. Real user queries, unexpected edge cases in production data, and incremental load failures that only appear under real volume are all post-go-live phenomena. Post-migration monitoring is not operational housekeeping — it's the final phase of the testing programme.
Automating Your Migration Testing Checklist
The checklist above contains more than 60 individual test checks. Executing them manually across hundreds of tables is not feasible — and manual execution introduces the human errors you're trying to catch.
Automated data warehouse migration testing platforms can execute the full checklist — row count parity, field-level validation, transformation checks, referential integrity, reconciliation — against the entire dataset in a fraction of the time that manual testing requires.
More importantly, automated tests can be re-run after every configuration change. When a mapping error is found and corrected, the full validation suite confirms whether the fix resolved the problem or introduced new ones — something manual testing cannot do reliably within migration timelines.
Vexdata's Data Warehouse Migration Testing platform automates every phase of this checklist — from pre-migration source profiling to post-go-live monitoring — and produces stakeholder-ready validation reports after every run. It covers 100% of records, not samples, and runs in parallel across tables to meet the time constraints of cutover windows.
Migration Sign-Off Requirements by Phase
Each phase should produce a formal sign-off before the next begins. Here's the minimum required for each gate:
Phase Gate | Required Evidence | Sign-Off Owner |
Pre-Migration | Source profile report; schema comparison document; business rules inventory; open issues log | Lead Data Engineer + Project Sponsor |
Each Wave | Row count parity report; field validation results; referential integrity check; open defects | Data Engineer + QA Lead |
Cutover / Go-Live | Final reconciliation report; BI validation results; performance benchmarks; access control confirmation; signed go/no-go document | Data Engineering Lead + QA Lead + Business Owner |
Post-Go-Live (30d) | Daily monitoring reports; weekly reconciliation summaries; defect resolution log; user acceptance confirmation | Data Engineering Lead + Business Owner |
The Bottom Line
A data warehouse migration is not complete when the data arrives in the target system. It's complete when you can prove — with documented test results — that every row arrived correctly, every transformation was applied accurately, every relationship is intact, and every downstream consumer is receiving the data they expect.
The checklist in this article gives you the testing framework to make that proof. Work through it phase by phase. Execute it after every load wave, not just at the end. Automate what can be automated — which is most of it.
If you're planning a data warehouse migration and want to see how automated validation works in practice, Vexdata's Data Warehouse Migration Testing platform runs the full checklist — source profiling, transformation validation, reconciliation, and post-go-live monitoring — across your entire dataset, not a sample, and produces audit-ready reports your stakeholders can actually use.
→ Learn more: vexdata.io/data-warehouse-migration | → Book a demo: vexdata.io/contact




Comments