Data Warehouse Migration Testing: The Complete Checklist for Data Engineers

Vexdata
Apr 13
10 min read

data warehouse migration testing checklist

If you've been involved in a data warehouse migration, you know the pattern. The project plan looks thorough. The technical team is experienced. The timeline seems reasonable. And then something breaks at 2 AM, two days before go-live, because a VARCHAR(255) field in the source system became VARCHAR(128) in the target — silently, without error, truncating eight months of customer notes.

Most data warehouse migration guides focus on the project: timelines, stakeholder communication, cutover planning, rollback strategy. Those things matter. But they're not what breaks migrations.

What breaks migrations is data. Specifically, data that was never tested systematically at every stage of the process.

This checklist is for data engineers and QA leads who need a rigorous, phase-by-phase testing framework — not a project management template. It covers every test type that matters, when to run it, who owns it, and what a pass/fail looks like. Use it as a working document alongside your migration.

"83% of data migration projects fail, exceed budget, or disrupt business operations." The reason is almost always data quality — not tooling, not timelines, not team capability.

Why Data Warehouse Migration Testing Is Different from Regular QA

Standard application QA tests functional behaviour — does the button work, does the form submit, does the API return the right status code. Data warehouse migration testing is a different discipline entirely.

You're not testing whether the system functions. You're testing whether every row of data that existed in the source warehouse arrived in the target warehouse with the correct values, in the correct structure, with the correct relationships intact — at a scale that makes manual verification impossible.

The testing surface is enormous. A mid-sized enterprise data warehouse typically contains:

Hundreds to thousands of tables across multiple schemas
Billions of individual field values to verify
Complex transformation logic applied during extraction or loading
Referential relationships that span dozens of tables
Business rules that exist only in the institutional knowledge of the team
Historical data accumulated over years, often with legacy quality issues baked in

Any one of these can fail silently. No error is thrown. The migration completes. The data arrives — but wrong.

That's why the testing framework for a data warehouse migration needs to be systematic, automated, and phase-specific. The checklist below gives you exactly that.

Phase 1 Pre-Migration Testing — Before a Single Row Moves

Pre-migration testing is the most valuable and most skipped phase of data warehouse validation. Every issue you find here costs a fraction of what it costs to find post-migration. The goal is to understand exactly what you're working with before the migration begins.

1.1 Source Data Profiling

Data profiling means analysing the source warehouse to characterise what the data actually looks like — not what the data dictionary says it should look like. These two things are rarely identical.

Run profiling across all tables marked for migration and document the following for each column:

Test Check	Owner	Priority
☐ Row count per table — establish baseline for parity checks	Data Engineer	Critical
☐ Null rate per column — identify mandatory fields with missing data	Data Engineer	Critical
☐ Distinct value counts — flag unexpected cardinality	Data Engineer	High
☐ Min / max / avg for numeric columns — identify out-of-range values	Data Engineer	High
☐ Date range validation — confirm timestamps within expected bounds	Data Engineer	High
☐ Duplicate detection — row-level and key-level deduplication	Data Engineer	Critical
☐ String length distribution — catch values exceeding target column width	Data Engineer	Critical
☐ Encoding check — identify non-UTF-8 characters that break target writes	Data Engineer	High
☐ Special character audit — flag values with commas, quotes, line breaks	Data Engineer	Medium

Output: A source data profile report documenting quality issues by table and column — this becomes your pre-migration baseline. Every post-migration check will compare against it.

1.2 Schema and Metadata Validation

Before any data moves, validate that the source-to-target mapping document accurately reflects both systems. Schema mismatches are the #1 cause of silent data corruption during migration.

Test Check	Owner	Priority
☐ Confirm all source tables exist in target schema with correct names	Data Engineer	Critical
☐ Validate data type compatibility for every mapped column	Data Engineer	Critical
☐ Check column length constraints — source VARCHAR(500) vs target VARCHAR(255)	Data Engineer	Critical
☐ Verify nullable vs NOT NULL constraints are correctly translated	Data Engineer	High
☐ Confirm primary key definitions match across source and target	Data Engineer	Critical
☐ Validate foreign key relationships are defined in target schema	Data Engineer	High
☐ Check default value handling for columns with defaults in source	Data Engineer	Medium
☐ Confirm index definitions are created in target for performance	Data Engineer	Medium
☐ Validate partition strategy matches if target uses partitioned tables	Data Engineer	High

1.3 Business Rule Inventory

This is the step most technical teams skip — and the one that causes the most post-go-live fires. Business rules are the logic that exists in the data, not in the schema. They aren't enforced by database constraints. They live in ETL code, stored procedures, and the institutional knowledge of data consumers.

Before migration, interview the owners of each major subject area and document:

Calculated fields and the formulas behind them
Aggregation logic used in reports that draw from this warehouse
Any data masking or anonymisation applied to sensitive columns
Lookup tables and the values consumers rely on being present
Filters or exclusions applied during ETL that won't be obvious in the target

These rules become your transformation validation test cases in Phase 2.

Phase 2 Migration Testing — During Each Load Wave

Most data warehouse migrations run in waves — fact tables, dimension tables, and historical archives loaded in sequence. A testing checkpoint should follow every wave. Not at the end of the migration. After every wave.

The cost of catching a mapping error after wave one is a configuration fix. The cost of catching the same error after wave five — when subsequent tables have been built on top of incorrect data — is a full re-migration of multiple tables.

2.1 Row Count Parity

The most basic check — but it must be executed first, because everything else depends on knowing all records arrived.

Test Check	Owner	Priority
☐ Total row count matches source for each table in the wave	Data Engineer	Critical
☐ Row count by key partition (e.g. by month, by region, by entity)	Data Engineer	Critical
☐ Confirm no records rejected by target constraints — check error logs	Data Engineer	Critical
☐ Verify incremental load counts if running delta loads	Data Engineer	High
☐ Confirm deleted/archived records handled per defined strategy	Data Engineer	High

2.2 Field-Level Value Validation

Row count parity tells you all records arrived. Field-level validation tells you whether they arrived correctly. This is where most migrations fail quietly.

Test Check	Owner	Priority
☐ Spot-check critical columns for 100 randomly sampled records	QA Lead	Critical
☐ Full comparison of all values in high-risk columns (IDs, amounts, dates)	Data Engineer	Critical
☐ Validate numeric precision — no rounding errors in financial fields	Data Engineer	Critical
☐ Date/timestamp format consistency — no timezone conversion errors	Data Engineer	Critical
☐ String encoding preserved — no corruption of special characters	Data Engineer	High
☐ NULL values mapped correctly — not substituted with empty string or 0	Data Engineer	High
☐ Boolean field handling consistent — TRUE/FALSE vs 1/0 vs Y/N	Data Engineer	Medium

2.3 Transformation Validation

Every calculated field, derived column, aggregation, and business rule applied during migration must be tested against expected output. This is where the business rule inventory from Phase 1 becomes essential.

Test Check	Owner	Priority
☐ Calculated fields produce expected values for sample set of records	Data Engineer	Critical
☐ Aggregations match source — totals, counts, averages by key dimensions	Data Engineer	Critical
☐ JOIN logic produces expected output — no fan-out or record loss	Data Engineer	Critical
☐ Lookup table values resolve correctly in transformed data	Data Engineer	High
☐ Filters and exclusion rules applied consistently across full dataset	Data Engineer	High
☐ Data masking / anonymisation applied correctly to PII fields	QA Lead	Critical
☐ Derived date fields (e.g. fiscal year, age bands) calculated correctly	Data Engineer	High

2.4 Referential Integrity

Once dimension tables are loaded, fact tables must be validated against them. Every foreign key in a fact table must resolve to a primary key in its parent dimension — no orphan records.

Test Check	Owner	Priority
☐ All foreign key values in fact tables exist in corresponding dimension tables	Data Engineer	Critical
☐ No orphan records — child rows with no parent	Data Engineer	Critical
☐ Lookup keys (e.g. product_id, customer_id) resolve in reporting layer	Data Engineer	Critical
☐ Hierarchical relationships preserved (parent → child → grandchild)	Data Engineer	High
☐ Bridge tables correctly represent many-to-many relationships	Data Engineer	High

Phase 3 Post-Migration Validation — Before and After Go-Live

Post-migration validation happens in two moments: immediately before go-live (cutover validation), and in the weeks following (production monitoring). Both are essential, and both are frequently under-resourced.

3.1 Cutover Validation

Cutover validation is the final gate. It runs against the most recent data load, under time pressure, and the output is the go/no-go decision. For this reason it must be fully automated — a validation suite that can be executed and produce a report in under an hour.

Test Check	Owner	Priority
☐ Final row counts match across all migrated tables	Data Engineer	Critical
☐ Reconciliation report produced — aggregates match source by key dimensions	Data Engineer	Critical
☐ All critical business rules verified with automated test suite	QA Lead	Critical
☐ Referential integrity confirmed across full fact/dimension model	Data Engineer	Critical
☐ BI tool reports tested — key dashboards produce expected metrics	QA Lead	Critical
☐ Application connectivity confirmed — all consuming systems connected	Data Engineer	High
☐ Performance benchmarks met — query response times within SLA	Data Engineer	High
☐ Access controls verified — user permissions correctly implemented	QA Lead	High
☐ Audit trail available for all migration validation runs	QA Lead	High
☐ Rollback plan confirmed and tested — can revert within defined window	Data Engineer	Critical

3.2 Post-Go-Live Monitoring

The migration is live. Users are on the new system. This is not the time to relax validation — it's the time to double down on monitoring. The two weeks following a data warehouse go-live are when previously undetected issues surface at scale.

Test Check	Owner	Priority
☐ Daily row count monitoring — alert on unexpected volume changes	Data Engineer	Critical
☐ Pipeline job completion monitoring — no silent failures	Data Engineer	Critical
☐ Report reconciliation — weekly comparison of key metrics vs source	QA Lead	Critical
☐ Schema change alerting — flag any unplanned structural changes	Data Engineer	High
☐ Data freshness checks — confirm incremental loads arriving on schedule	Data Engineer	High
☐ User-reported discrepancy triage — formal process for flagging data issues	QA Lead	High
☐ Performance trend monitoring — query execution time baselines	Data Engineer	Medium

Pro tip: Set up automated daily reconciliation for the first 30 days post-migration. The most common failure mode is a data issue that passes cutover validation but only surfaces when production volumes and usage patterns stress the system.

The 5 Most Common Data Warehouse Migration Testing Failures

Based on real migration post-mortems, these are the failure modes that recur most frequently — and that a rigorous testing checklist would have caught:

1. Trusting the Source Schema Documentation

Source-to-target mapping documents are built from schema documentation. Schema documentation describes how the warehouse was designed — not how it evolved over years of use. Columns that appear as NOT NULL in the documentation contain nulls in practice. Fields documented as VARCHAR(100) contain values 247 characters long. Always profile the actual data, not the documentation.

2. Testing at the End Instead of After Every Wave

Teams that test only at the end of the migration face one of two outcomes: they catch a systematic error that requires re-migrating multiple tables, or they don't catch it and it goes to production. Testing after every load wave catches errors when they're cheapest to fix.

3. Forgetting the BI Layer

A migration that passes data-level validation can still fail at the reporting layer. A dashboard that was correct in the old warehouse can produce wrong numbers in the new one — because a calculation that was performed in the ETL layer is no longer being applied, or because a dimension key changed format. BI report validation must be part of the testing checklist.

4. No Automated Reconciliation

Manual reconciliation — a data engineer writing SQL queries against source and target and comparing results in a spreadsheet — does not scale. At 1,000+ tables and billions of rows, manual reconciliation either misses systematic errors (because the sample is too small) or takes so long that the migration falls behind schedule. Automated reconciliation is not optional at enterprise scale.

5. Stopping Validation at Go-Live

Production use surfaces issues that testing environments never will. Real user queries, unexpected edge cases in production data, and incremental load failures that only appear under real volume are all post-go-live phenomena. Post-migration monitoring is not operational housekeeping — it's the final phase of the testing programme.

Automating Your Migration Testing Checklist

The checklist above contains more than 60 individual test checks. Executing them manually across hundreds of tables is not feasible — and manual execution introduces the human errors you're trying to catch.

Automated data warehouse migration testing platforms can execute the full checklist — row count parity, field-level validation, transformation checks, referential integrity, reconciliation — against the entire dataset in a fraction of the time that manual testing requires.

More importantly, automated tests can be re-run after every configuration change. When a mapping error is found and corrected, the full validation suite confirms whether the fix resolved the problem or introduced new ones — something manual testing cannot do reliably within migration timelines.

Vexdata's Data Warehouse Migration Testing platform automates every phase of this checklist — from pre-migration source profiling to post-go-live monitoring — and produces stakeholder-ready validation reports after every run. It covers 100% of records, not samples, and runs in parallel across tables to meet the time constraints of cutover windows.

Migration Sign-Off Requirements by Phase

Each phase should produce a formal sign-off before the next begins. Here's the minimum required for each gate:

Phase Gate	Required Evidence	Sign-Off Owner
Pre-Migration	Source profile report; schema comparison document; business rules inventory; open issues log	Lead Data Engineer + Project Sponsor
Each Wave	Row count parity report; field validation results; referential integrity check; open defects	Data Engineer + QA Lead
Cutover / Go-Live	Final reconciliation report; BI validation results; performance benchmarks; access control confirmation; signed go/no-go document	Data Engineering Lead + QA Lead + Business Owner
Post-Go-Live (30d)	Daily monitoring reports; weekly reconciliation summaries; defect resolution log; user acceptance confirmation	Data Engineering Lead + Business Owner

The Bottom Line

A data warehouse migration is not complete when the data arrives in the target system. It's complete when you can prove — with documented test results — that every row arrived correctly, every transformation was applied accurately, every relationship is intact, and every downstream consumer is receiving the data they expect.

The checklist in this article gives you the testing framework to make that proof. Work through it phase by phase. Execute it after every load wave, not just at the end. Automate what can be automated — which is most of it.

If you're planning a data warehouse migration and want to see how automated validation works in practice, Vexdata's Data Warehouse Migration Testing platform runs the full checklist — source profiling, transformation validation, reconciliation, and post-go-live monitoring — across your entire dataset, not a sample, and produces audit-ready reports your stakeholders can actually use.

→ Learn more: vexdata.io/data-warehouse-migration | → Book a demo: vexdata.io/contact

Data Warehouse Migration Testing: The Complete Checklist for Data Engineers

Recent Posts

Comments

Data Ingestion Validation

Data Transformation Validation

Data Lake Testing

Date Warehouse Migration

Data Migration Testing

BI Tool Testing

Cloud Migration Testing

CRM Tool Migration Testing

Flat File Testing

Document Validation

Data Validation

Data Migration Testing

Data Quality and Cleansing

Data Observability

Designed by DataDrivify