Snowflake Migration Validation: What to Test Before, During, and After Cutover
- Vexdata

- 7 hours ago
- 11 min read

Migrating to Snowflake is the most common data warehouse migration happening right now. Teradata to Snowflake. Redshift to Snowflake. Cloudera / Hadoop to Snowflake. Oracle to Snowflake. The reasons are well understood: elastic compute, consumption-based pricing, separation of storage and compute, and native support for semi-structured data at scale.
What's less discussed is why so many of these migrations pass technical validation and still fail in production. The pipeline completes. Snowflake shows the tables loaded. No errors in the job logs. And two weeks later, a BI analyst notices that revenue figures for Q3 are off by 3% — and nobody can immediately explain why.
The answer is almost always one of three things: a data type that behaves differently in Snowflake than it did in the source system, a timestamp that silently shifted due to timezone handling, or a transformation rule that was applied in the source ETL and wasn't carried over to the new ELT pipeline.
This guide is written for data engineers and data platform teams running Snowflake migrations. It covers the Snowflake-specific validation checks that generic migration guides don't — the data type traps, the timestamp edge cases, the NULL semantics differences, and the cutover checks that actually matter when you're under go-live pressure.
"Every failed Snowflake migration I've rescued had the same root cause: validation treated as a checkbox, not a phase." — VP of Data Engineering (cited in Aegis Softtech migration guide)
Why Snowflake Migrations Need Snowflake-Specific Validation
Generic migration testing advice — profile your source, check row counts, validate transformations — applies to every migration. But Snowflake introduces a set of platform-specific behaviours that your validation suite must account for explicitly. If you're treating a Snowflake migration like any other database-to-database move, these will catch you.
Case Sensitivity
Teradata is case-insensitive by default. Snowflake is case-sensitive. The string values 'Glass', 'GLASS', and 'glass' are three distinct values in Snowflake. If your source system stored values inconsistently — some uppercase, some mixed — and your queries relied on case-insensitive comparisons, those queries will return incorrect results in Snowflake without throwing any errors.
⚠ Validation requirement: For every string column used in a join or filter condition, check for case inconsistency in source data and confirm Snowflake queries handle it explicitly with UPPER() or LOWER().
Primary Key and Foreign Key Enforcement
In Teradata, Oracle, and most traditional warehouses, primary key and foreign key constraints are enforced — the database rejects records that violate them. In Snowflake, these constraints are defined but not enforced. Snowflake stores the constraint metadata for query optimisation hints but does not prevent duplicate primary keys or orphan foreign keys from being inserted.
This means that ETL load processes which relied on constraint enforcement to prevent bad data from entering the source warehouse will not get the same protection in Snowflake. Duplicate records and orphan foreign keys that the source system would have rejected will load successfully into Snowflake and quietly corrupt your data model.
⚠ Validation requirement: Run explicit duplicate key checks and referential integrity checks on every loaded table. Do not assume constraint definitions in the DDL provide any enforcement.
NULL Semantics
Snowflake's NULL handling differs subtly from some source platforms. In Teradata, COALESCE() with mixed numeric and string parameters converts numerics to string. In Snowflake, it converts strings to numeric — which can cause unexpected type coercion errors or silent value changes in columns that mix types.
The Data Type Mapping Table: Where Migrations Break Silently
This is the section most migration teams skip. Data type differences between source platforms and Snowflake are the primary source of silent data corruption — values that load without errors but arrive with truncated precision, incorrect formatting, or fundamentally different behaviour.
The table below covers the most common and most dangerous mappings across Teradata, Redshift, and Cloudera/Hive:
Source Type | Snowflake Type | Validation Risk |
TD BYTEINT | SMALLINT | Value range change: BYTEINT is 0–255; SMALLINT is −32,768 to 32,767. Check for downstream logic that assumed 0–255 boundary. |
TD PERIOD(DATE) | VARCHAR (2 cols) | Snowflake has no PERIOD type. Columns must be split into start/end DATE columns. Any code using PERIOD functions requires rewrite. |
TD TIMESTAMP(6) | TIMESTAMP_NTZ(9) | Teradata stores 6 fractional digits; Snowflake default is 9. Precision mismatch can break exact equality comparisons. |
RS INTERVAL DAY TO SECOND | VARCHAR | Redshift INTERVAL types have no Snowflake equivalent — converted to VARCHAR. Any arithmetic using these fields breaks silently. |
RS HLLSKETCH | Not supported | Redshift HyperLogLog sketch type has no Snowflake equivalent. Columns using this type require a complete redesign approach. |
RS BPCHAR | VARCHAR | Redshift fixed-length CHAR converted to variable-length VARCHAR. Trailing space behaviour changes — affects string comparison logic. |
Hive STRING | VARCHAR(16777216) | Hive STRING maps to Snowflake max VARCHAR. Existing length constraints are lost. Downstream column length assumptions break. |
Hive TINYINT | SMALLINT | Range expansion: TINYINT is −128 to 127; SMALLINT is −32,768 to 32,767. Values load correctly but type contracts change. |
Oracle NUMBER(p,s) | NUMBER(p,s) | Generally safe but verify precision/scale explicitly. Oracle NUMBER with no precision defaults to 38,127 in Snowflake — can cause unexpected rounding. |
SQL Server DATETIME2 | TIMESTAMP_NTZ | Timezone-naive mapping is correct but fractional second precision must match. Verify microsecond values are preserved. |
Before migration begins: Run a full data type audit of your source schema. Flag every column using a type in this table and define the explicit validation rule for each one before a single row moves.
The Timestamp Problem: Snowflake's Most Common Silent Failure
Timestamp handling is the single most common source of post-migration data discrepancies in Snowflake migrations. It deserves its own section.
Snowflake has three timestamp types with distinct behaviour:
TIMESTAMP_NTZ (no timezone) — stores the literal datetime value with no timezone context. The most common default.
TIMESTAMP_TZ (with timezone) — stores the UTC value plus a timezone offset. Values from different timezones are stored with their offset.
TIMESTAMP_LTZ (local timezone) — stores in UTC and converts to the session timezone on display. Output changes based on session context.
When migrating from a source system that stored timestamps in local time — common in legacy Teradata and Oracle warehouses — loading into TIMESTAMP_NTZ without timezone conversion means your timestamps are technically correct but represent the wrong moment in time relative to UTC.
This doesn't produce an error. The values load. Your row counts match. Your field-level comparison passes if you're comparing string representations. But any analysis that groups by date or calculates time differences may be off by hours — silently.
⚠ Test required: For every timestamp column in the migration, compare date-partitioned aggregates between source and target. Do not rely on string equality checks for timestamp validation.
Snowflake Timestamp Validation Queries
Use these during migration testing to catch timezone drift:
-- Check for timezone offset differences in timestamp values
SELECT DATE_TRUNC('day', source_ts) AS source_day,
DATE_TRUNC('day', target_ts) AS target_day,
COUNT(*) AS mismatch_count
FROM source_target_comparison
WHERE source_day <> target_day
GROUP BY 1, 2;
-- Verify TIMESTAMP_NTZ vs TIMESTAMP_LTZ conversion
SELECT CONVERT_TIMEZONE('UTC', 'America/New_York', your_ts_column)
FROM your_table LIMIT 100;
Phase 1 Pre-Migration — Before Any Data Moves to Snowflake
Pre-migration validation for Snowflake goes beyond generic source profiling. The goal is to identify every data characteristic that will behave differently in Snowflake before the migration begins.
Source Schema Audit for Snowflake Compatibility
Validation Check | Applies To | Priority |
☐ Inventory all PERIOD, INTERVAL, and platform-specific data types — map each to Snowflake equivalent | All sources | Critical |
☐ Profile all VARCHAR columns for actual max length — confirm target column definitions are wide enough | All sources | Critical |
☐ Audit all TIMESTAMP columns — document timezone context for each (local, UTC, or naive) | All sources | Critical |
☐ Check for CHAR vs VARCHAR columns — document trailing space behaviour in source system | Teradata / Oracle | High |
☐ Identify all columns used as join keys — flag string keys that may have case inconsistencies | Teradata | Critical |
☐ List all PRIMARY KEY and FOREIGN KEY constraints — document enforcement behaviour in source | All sources | Critical |
☐ Audit stored procedures, macros, and UDFs that reference platform-specific functions | Teradata / Oracle | High |
☐ Identify Redshift HLLSKETCH or SUPER columns that have no Snowflake equivalent | Redshift | High |
☐ Document all columns using INTERVAL types in Redshift — confirm VARCHAR conversion strategy | Redshift | Critical |
☐ Run null rate profiling on all columns — flag NOT NULL columns with nulls in source data | All sources | High |
Snowflake Target Environment Validation
Validation Check | Applies To | Priority |
☐ Confirm DDL for all target tables matches source-to-target mapping document | Data Engineer | Critical |
☐ Validate Snowflake database, schema, and table naming conventions applied consistently | Data Engineer | Medium |
☐ Confirm TIMESTAMP column types explicitly defined (NTZ / TZ / LTZ) — not relying on session defaults | Data Engineer | Critical |
☐ Verify STAGE configurations for source file format (delimiter, encoding, date format) | Data Engineer | Critical |
☐ Confirm role-based access controls (RBAC) applied — source system permissions do not auto-migrate | Data Engineer | High |
☐ Test sample COPY INTO command with 100-row file before full load | Data Engineer | High |
Phase 2 During Migration — Validating Each Load Wave
Snowflake migrations typically load in waves — dimension tables, fact tables, historical archives. Each wave requires an immediate validation pass. The checks below are specific to Snowflake's loading behaviour and should run against every table loaded.
Row Count and Load Completeness
Validation Check | Applies To | Priority |
☐ Row count in Snowflake table matches source system record count exactly | Data Engineer | Critical |
☐ Check Snowflake COPY INTO error logs — confirm zero rejected records | Data Engineer | Critical |
☐ Verify VARIANT/semi-structured column data loaded without truncation | Data Engineer | High |
☐ For incremental loads: confirm delta records identified and loaded correctly | Data Engineer | Critical |
☐ Validate file staging completeness — all source files copied to STAGE before load | Data Engineer | High |
Snowflake-Specific Data Validation
Validation Check | Applies To | Priority |
☐ String column case consistency check — run UPPER(col) = col checks on join key columns | Data Engineer | Critical |
☐ Duplicate primary key check — Snowflake will not reject these | Data Engineer | Critical |
☐ Referential integrity check — foreign keys resolve to parent table records | Data Engineer | Critical |
☐ TIMESTAMP column date-partitioned aggregate comparison vs source | Data Engineer | Critical |
☐ Numeric precision spot check — verify decimal values match to full precision | Data Engineer | High |
☐ NULL value distribution matches source profile — no unexpected null inflation | Data Engineer | High |
☐ VARIANT column query test — confirm JSON/semi-structured data queryable as expected | Data Engineer | High |
☐ String length distribution check — confirm no silent truncation occurred at COPY INTO | Data Engineer | High |
Transformation Validation
Validation Check | Applies To | Priority |
☐ All ELT transformation logic produces identical output to source ETL for sample record set | Data Engineer | Critical |
☐ Aggregated metrics (SUM, COUNT, AVG) match source system by key dimensions | Data Engineer | Critical |
☐ Derived date fields calculated correctly using Snowflake date functions | Data Engineer | High |
☐ COALESCE logic produces expected output — verify mixed-type parameter handling | Data Engineer | High |
☐ Any PERIOD column splitting applied correctly — start_date and end_date values accurate | Data Engineer | Critical |
☐ INTERVAL column VARCHAR conversions validated — arithmetic using these fields works as expected | Data Engineer | High |
☐ Business rule validation: calculated KPIs match source system reports for same time period | QA Lead | Critical |
Phase 3 Cutover Validation — The Go / No-Go Gate
Cutover validation for a Snowflake migration is the final systematic check before production traffic switches to Snowflake. It must be automated, fast, and produce a clear go/no-go output. The checks below are your cutover gate requirements.
Validation Check | Applies To | Priority |
☐ Final row count parity confirmed across all migrated tables | Data Engineer | Critical |
☐ Automated reconciliation report generated — aggregates match by key dimensions | Data Engineer | Critical |
☐ Full duplicate primary key scan across all fact and dimension tables | Data Engineer | Critical |
☐ Full referential integrity check — zero orphan records | Data Engineer | Critical |
☐ Timestamp comparison: date-grouped aggregates match source for all TIMESTAMP columns | Data Engineer | Critical |
☐ Top 20 production queries tested on Snowflake — output diff against source system | Data Engineer | Critical |
☐ BI dashboards validated — key metrics match source system reports for same date range | QA Lead | Critical |
☐ ELT pipeline connectivity confirmed — all scheduled jobs point to Snowflake | Data Engineer | High |
☐ Connection strings updated in all consuming applications and APIs | Data Engineer | High |
☐ Snowflake virtual warehouse sizing confirmed for expected production load | Data Engineer | High |
☐ RBAC access tested — each user role confirmed to have correct data access in Snowflake | QA Lead | High |
☐ Rollback plan confirmed — can revert to source system within defined window if needed | Data Engineer | Critical |
☐ Signed go/no-go document completed by Data Engineering Lead and Business Owner | QA Lead | Critical |
Cutover timing: Run the automated validation suite immediately before cutover, not hours earlier. Any delta load between validation and cutover is a window for the source to drift from the target.
Post-Cutover: The 30-Day Monitoring Window
The migration is live. This is not when validation stops — it's when a different kind of validation begins. Production usage surfaces issues that test environments never encounter: edge cases in real user data, unexpected query patterns, and incremental load failures that only appear at production volume.
Run these checks for the first 30 days post-cutover:
Daily row count monitoring —
Alert on any table where daily record counts deviate more than 5% from the rolling 7-day average without a corresponding source explanation.
Incremental load validation —
After every scheduled ELT run, confirm new records loaded match expected volume from source system change feed.
Weekly BI reconciliation —
For the first four weeks, run key financial and operational reports against both source and
Snowflake and compare outputs. This is your final safety net.
Query performance baseline —
Use Snowflake's Query Profile to identify full table scans and spilling-to-disk operations.
Snowflake behaves differently from row-store sources — clustering keys and materialized views may be needed.
User-reported discrepancy triage —
Establish a formal channel for data consumers to flag numbers that look wrong. Most real-world data issues are spotted by analysts, not automated checks.
The 4 Most Common Snowflake Migration Validation Failures
1. Timestamps That Look Right But Aren't
The most frequent post-migration complaint: "The numbers are off by a small percentage." Root cause, 80% of the time: timestamp timezone handling. Local-time timestamps loaded into TIMESTAMP_NTZ shift date boundaries when the source system was in a non-UTC timezone. Transactions that happened at 11 PM Eastern become midnight UTC — crossing day boundaries and shifting daily aggregates.
2. Case-Sensitive Join Failures
JOIN conditions that worked in Teradata because both sides happened to be uppercase fail silently in Snowflake. They don't fail with an error — they return zero rows for records where the case doesn't match. The table loads successfully. The join produces an empty result set that looks like data doesn't exist.
3. Missing Constraint Enforcement Allows Duplicates
In the source warehouse, a unique constraint on a customer ID column prevented duplicate records from being inserted. In Snowflake, the same constraint definition exists in the DDL but is not enforced. A bug in the ELT logic inserts duplicate customer records. Row count parity checks pass — the expected number of records is there. It's just that some of them are duplicates.
4. INTERVAL and PERIOD Columns That Convert to VARCHAR
Redshift INTERVAL and Teradata PERIOD columns convert to VARCHAR in Snowflake. The values load. But any downstream transformation or report that performs arithmetic on these columns silently fails — VARCHAR doesn't support date arithmetic. The error surfaces weeks later when an analyst tries to calculate time differences.
Automating Snowflake Migration Validation
The validation checks described in this guide — across three phases, covering data types, timestamps, row counts, transformations, referential integrity, and BI output — are too extensive to run manually at enterprise scale. A typical data warehouse migration involves hundreds of tables, billions of rows, and dozens of transformation rules.
Automated validation platforms can execute this entire checklist — including Snowflake-specific checks — across the full dataset, not a sample, producing a structured validation report after every load wave. When a mapping error is identified and corrected, the suite re-runs and confirms the fix without requiring another manual test cycle.
Vexdata's platform handles Snowflake migration validation end to end — from pre-migration source profiling and data type compatibility checks, through wave-by-wave reconciliation, to cutover sign-off reporting. It connects directly to your source system and your Snowflake target and compares them at row and field level, automatically.
Ready to Validate Your Snowflake Migration?
If you're planning a migration to Snowflake — from Teradata, Redshift, Cloudera, or another platform — the validation framework in this guide gives you the testing structure to do it correctly. Work through each phase. Don't skip the data type audit. Run timestamp checks with date-partitioned aggregates, not string equality.
And if you want to see automated Snowflake migration validation in practice, we've also published a detailed walkthrough of how Vexdata handles Cloudera-to-Snowflake validation specifically — with the exact checks that surface the issues described in this post.
→ Read: Making Cloudera to Snowflake Migration Validation Effortless with Vexdata
→ See the platform: vexdata.io/data-migration | → Book a demo: vexdata.io/contact




Comments