Snowflake Migration Validation: What to Test Before, During, and After Cutover

Vexdata
7 hours ago
11 min read

Migrating to Snowflake is the most common data warehouse migration happening right now. Teradata to Snowflake. Redshift to Snowflake. Cloudera / Hadoop to Snowflake. Oracle to Snowflake. The reasons are well understood: elastic compute, consumption-based pricing, separation of storage and compute, and native support for semi-structured data at scale.

What's less discussed is why so many of these migrations pass technical validation and still fail in production. The pipeline completes. Snowflake shows the tables loaded. No errors in the job logs. And two weeks later, a BI analyst notices that revenue figures for Q3 are off by 3% — and nobody can immediately explain why.

The answer is almost always one of three things: a data type that behaves differently in Snowflake than it did in the source system, a timestamp that silently shifted due to timezone handling, or a transformation rule that was applied in the source ETL and wasn't carried over to the new ELT pipeline.

This guide is written for data engineers and data platform teams running Snowflake migrations. It covers the Snowflake-specific validation checks that generic migration guides don't — the data type traps, the timestamp edge cases, the NULL semantics differences, and the cutover checks that actually matter when you're under go-live pressure.

"Every failed Snowflake migration I've rescued had the same root cause: validation treated as a checkbox, not a phase." — VP of Data Engineering (cited in Aegis Softtech migration guide)

Why Snowflake Migrations Need Snowflake-Specific Validation

Generic migration testing advice — profile your source, check row counts, validate transformations — applies to every migration. But Snowflake introduces a set of platform-specific behaviours that your validation suite must account for explicitly. If you're treating a Snowflake migration like any other database-to-database move, these will catch you.

Case Sensitivity

Teradata is case-insensitive by default. Snowflake is case-sensitive. The string values 'Glass', 'GLASS', and 'glass' are three distinct values in Snowflake. If your source system stored values inconsistently — some uppercase, some mixed — and your queries relied on case-insensitive comparisons, those queries will return incorrect results in Snowflake without throwing any errors.

⚠ Validation requirement: For every string column used in a join or filter condition, check for case inconsistency in source data and confirm Snowflake queries handle it explicitly with UPPER() or LOWER().

Primary Key and Foreign Key Enforcement

In Teradata, Oracle, and most traditional warehouses, primary key and foreign key constraints are enforced — the database rejects records that violate them. In Snowflake, these constraints are defined but not enforced. Snowflake stores the constraint metadata for query optimisation hints but does not prevent duplicate primary keys or orphan foreign keys from being inserted.

This means that ETL load processes which relied on constraint enforcement to prevent bad data from entering the source warehouse will not get the same protection in Snowflake. Duplicate records and orphan foreign keys that the source system would have rejected will load successfully into Snowflake and quietly corrupt your data model.

⚠ Validation requirement: Run explicit duplicate key checks and referential integrity checks on every loaded table. Do not assume constraint definitions in the DDL provide any enforcement.

NULL Semantics

Snowflake's NULL handling differs subtly from some source platforms. In Teradata, COALESCE() with mixed numeric and string parameters converts numerics to string. In Snowflake, it converts strings to numeric — which can cause unexpected type coercion errors or silent value changes in columns that mix types.

The Data Type Mapping Table: Where Migrations Break Silently

This is the section most migration teams skip. Data type differences between source platforms and Snowflake are the primary source of silent data corruption — values that load without errors but arrive with truncated precision, incorrect formatting, or fundamentally different behaviour.

The table below covers the most common and most dangerous mappings across Teradata, Redshift, and Cloudera/Hive:

Source Type	Snowflake Type	Validation Risk
TD BYTEINT	SMALLINT	Value range change: BYTEINT is 0–255; SMALLINT is −32,768 to 32,767. Check for downstream logic that assumed 0–255 boundary.
TD PERIOD(DATE)	VARCHAR (2 cols)	Snowflake has no PERIOD type. Columns must be split into start/end DATE columns. Any code using PERIOD functions requires rewrite.
TD TIMESTAMP(6)	TIMESTAMP_NTZ(9)	Teradata stores 6 fractional digits; Snowflake default is 9. Precision mismatch can break exact equality comparisons.
RS INTERVAL DAY TO SECOND	VARCHAR	Redshift INTERVAL types have no Snowflake equivalent — converted to VARCHAR. Any arithmetic using these fields breaks silently.
RS HLLSKETCH	Not supported	Redshift HyperLogLog sketch type has no Snowflake equivalent. Columns using this type require a complete redesign approach.
RS BPCHAR	VARCHAR	Redshift fixed-length CHAR converted to variable-length VARCHAR. Trailing space behaviour changes — affects string comparison logic.
Hive STRING	VARCHAR(16777216)	Hive STRING maps to Snowflake max VARCHAR. Existing length constraints are lost. Downstream column length assumptions break.
Hive TINYINT	SMALLINT	Range expansion: TINYINT is −128 to 127; SMALLINT is −32,768 to 32,767. Values load correctly but type contracts change.
Oracle NUMBER(p,s)	NUMBER(p,s)	Generally safe but verify precision/scale explicitly. Oracle NUMBER with no precision defaults to 38,127 in Snowflake — can cause unexpected rounding.
SQL Server DATETIME2	TIMESTAMP_NTZ	Timezone-naive mapping is correct but fractional second precision must match. Verify microsecond values are preserved.

Before migration begins: Run a full data type audit of your source schema. Flag every column using a type in this table and define the explicit validation rule for each one before a single row moves.

The Timestamp Problem: Snowflake's Most Common Silent Failure

Timestamp handling is the single most common source of post-migration data discrepancies in Snowflake migrations. It deserves its own section.

Snowflake has three timestamp types with distinct behaviour:

TIMESTAMP_NTZ (no timezone) — stores the literal datetime value with no timezone context. The most common default.

TIMESTAMP_TZ (with timezone) — stores the UTC value plus a timezone offset. Values from different timezones are stored with their offset.

TIMESTAMP_LTZ (local timezone) — stores in UTC and converts to the session timezone on display. Output changes based on session context.

When migrating from a source system that stored timestamps in local time — common in legacy Teradata and Oracle warehouses — loading into TIMESTAMP_NTZ without timezone conversion means your timestamps are technically correct but represent the wrong moment in time relative to UTC.

This doesn't produce an error. The values load. Your row counts match. Your field-level comparison passes if you're comparing string representations. But any analysis that groups by date or calculates time differences may be off by hours — silently.

⚠ Test required: For every timestamp column in the migration, compare date-partitioned aggregates between source and target. Do not rely on string equality checks for timestamp validation.

Snowflake Timestamp Validation Queries

Use these during migration testing to catch timezone drift:

-- Check for timezone offset differences in timestamp values

SELECT DATE_TRUNC('day', source_ts) AS source_day,

DATE_TRUNC('day', target_ts) AS target_day,

COUNT(*) AS mismatch_count

FROM source_target_comparison

WHERE source_day <> target_day

GROUP BY 1, 2;

-- Verify TIMESTAMP_NTZ vs TIMESTAMP_LTZ conversion

SELECT CONVERT_TIMEZONE('UTC', 'America/New_York', your_ts_column)

FROM your_table LIMIT 100;

Phase 1 Pre-Migration — Before Any Data Moves to Snowflake

Pre-migration validation for Snowflake goes beyond generic source profiling. The goal is to identify every data characteristic that will behave differently in Snowflake before the migration begins.

Source Schema Audit for Snowflake Compatibility

Validation Check	Applies To	Priority
☐ Inventory all PERIOD, INTERVAL, and platform-specific data types — map each to Snowflake equivalent	All sources	Critical
☐ Profile all VARCHAR columns for actual max length — confirm target column definitions are wide enough	All sources	Critical
☐ Audit all TIMESTAMP columns — document timezone context for each (local, UTC, or naive)	All sources	Critical
☐ Check for CHAR vs VARCHAR columns — document trailing space behaviour in source system	Teradata / Oracle	High
☐ Identify all columns used as join keys — flag string keys that may have case inconsistencies	Teradata	Critical
☐ List all PRIMARY KEY and FOREIGN KEY constraints — document enforcement behaviour in source	All sources	Critical
☐ Audit stored procedures, macros, and UDFs that reference platform-specific functions	Teradata / Oracle	High
☐ Identify Redshift HLLSKETCH or SUPER columns that have no Snowflake equivalent	Redshift	High
☐ Document all columns using INTERVAL types in Redshift — confirm VARCHAR conversion strategy	Redshift	Critical
☐ Run null rate profiling on all columns — flag NOT NULL columns with nulls in source data	All sources	High

Snowflake Target Environment Validation

Validation Check	Applies To	Priority
☐ Confirm DDL for all target tables matches source-to-target mapping document	Data Engineer	Critical
☐ Validate Snowflake database, schema, and table naming conventions applied consistently	Data Engineer	Medium
☐ Confirm TIMESTAMP column types explicitly defined (NTZ / TZ / LTZ) — not relying on session defaults	Data Engineer	Critical
☐ Verify STAGE configurations for source file format (delimiter, encoding, date format)	Data Engineer	Critical
☐ Confirm role-based access controls (RBAC) applied — source system permissions do not auto-migrate	Data Engineer	High
☐ Test sample COPY INTO command with 100-row file before full load	Data Engineer	High

Phase 2 During Migration — Validating Each Load Wave

Snowflake migrations typically load in waves — dimension tables, fact tables, historical archives. Each wave requires an immediate validation pass. The checks below are specific to Snowflake's loading behaviour and should run against every table loaded.

Row Count and Load Completeness

Validation Check	Applies To	Priority
☐ Row count in Snowflake table matches source system record count exactly	Data Engineer	Critical
☐ Check Snowflake COPY INTO error logs — confirm zero rejected records	Data Engineer	Critical
☐ Verify VARIANT/semi-structured column data loaded without truncation	Data Engineer	High
☐ For incremental loads: confirm delta records identified and loaded correctly	Data Engineer	Critical
☐ Validate file staging completeness — all source files copied to STAGE before load	Data Engineer	High

Snowflake-Specific Data Validation

Validation Check	Applies To	Priority
☐ String column case consistency check — run UPPER(col) = col checks on join key columns	Data Engineer	Critical
☐ Duplicate primary key check — Snowflake will not reject these	Data Engineer	Critical
☐ Referential integrity check — foreign keys resolve to parent table records	Data Engineer	Critical
☐ TIMESTAMP column date-partitioned aggregate comparison vs source	Data Engineer	Critical
☐ Numeric precision spot check — verify decimal values match to full precision	Data Engineer	High
☐ NULL value distribution matches source profile — no unexpected null inflation	Data Engineer	High
☐ VARIANT column query test — confirm JSON/semi-structured data queryable as expected	Data Engineer	High
☐ String length distribution check — confirm no silent truncation occurred at COPY INTO	Data Engineer	High

Transformation Validation

Validation Check	Applies To	Priority
☐ All ELT transformation logic produces identical output to source ETL for sample record set	Data Engineer	Critical
☐ Aggregated metrics (SUM, COUNT, AVG) match source system by key dimensions	Data Engineer	Critical
☐ Derived date fields calculated correctly using Snowflake date functions	Data Engineer	High
☐ COALESCE logic produces expected output — verify mixed-type parameter handling	Data Engineer	High
☐ Any PERIOD column splitting applied correctly — start_date and end_date values accurate	Data Engineer	Critical
☐ INTERVAL column VARCHAR conversions validated — arithmetic using these fields works as expected	Data Engineer	High
☐ Business rule validation: calculated KPIs match source system reports for same time period	QA Lead	Critical

Phase 3 Cutover Validation — The Go / No-Go Gate

Cutover validation for a Snowflake migration is the final systematic check before production traffic switches to Snowflake. It must be automated, fast, and produce a clear go/no-go output. The checks below are your cutover gate requirements.

Validation Check	Applies To	Priority
☐ Final row count parity confirmed across all migrated tables	Data Engineer	Critical
☐ Automated reconciliation report generated — aggregates match by key dimensions	Data Engineer	Critical
☐ Full duplicate primary key scan across all fact and dimension tables	Data Engineer	Critical
☐ Full referential integrity check — zero orphan records	Data Engineer	Critical
☐ Timestamp comparison: date-grouped aggregates match source for all TIMESTAMP columns	Data Engineer	Critical
☐ Top 20 production queries tested on Snowflake — output diff against source system	Data Engineer	Critical
☐ BI dashboards validated — key metrics match source system reports for same date range	QA Lead	Critical
☐ ELT pipeline connectivity confirmed — all scheduled jobs point to Snowflake	Data Engineer	High
☐ Connection strings updated in all consuming applications and APIs	Data Engineer	High
☐ Snowflake virtual warehouse sizing confirmed for expected production load	Data Engineer	High
☐ RBAC access tested — each user role confirmed to have correct data access in Snowflake	QA Lead	High
☐ Rollback plan confirmed — can revert to source system within defined window if needed	Data Engineer	Critical
☐ Signed go/no-go document completed by Data Engineering Lead and Business Owner	QA Lead	Critical

Cutover timing: Run the automated validation suite immediately before cutover, not hours earlier. Any delta load between validation and cutover is a window for the source to drift from the target.

Post-Cutover: The 30-Day Monitoring Window

The migration is live. This is not when validation stops — it's when a different kind of validation begins. Production usage surfaces issues that test environments never encounter: edge cases in real user data, unexpected query patterns, and incremental load failures that only appear at production volume.

Run these checks for the first 30 days post-cutover:

Daily row count monitoring —

Alert on any table where daily record counts deviate more than 5% from the rolling 7-day average without a corresponding source explanation.

Incremental load validation —

After every scheduled ELT run, confirm new records loaded match expected volume from source system change feed.

Weekly BI reconciliation —

For the first four weeks, run key financial and operational reports against both source and

Snowflake and compare outputs. This is your final safety net.

Query performance baseline —

Use Snowflake's Query Profile to identify full table scans and spilling-to-disk operations.

Snowflake behaves differently from row-store sources — clustering keys and materialized views may be needed.

User-reported discrepancy triage —

Establish a formal channel for data consumers to flag numbers that look wrong. Most real-world data issues are spotted by analysts, not automated checks.

The 4 Most Common Snowflake Migration Validation Failures

1. Timestamps That Look Right But Aren't

The most frequent post-migration complaint: "The numbers are off by a small percentage." Root cause, 80% of the time: timestamp timezone handling. Local-time timestamps loaded into TIMESTAMP_NTZ shift date boundaries when the source system was in a non-UTC timezone. Transactions that happened at 11 PM Eastern become midnight UTC — crossing day boundaries and shifting daily aggregates.

2. Case-Sensitive Join Failures

JOIN conditions that worked in Teradata because both sides happened to be uppercase fail silently in Snowflake. They don't fail with an error — they return zero rows for records where the case doesn't match. The table loads successfully. The join produces an empty result set that looks like data doesn't exist.

3. Missing Constraint Enforcement Allows Duplicates

In the source warehouse, a unique constraint on a customer ID column prevented duplicate records from being inserted. In Snowflake, the same constraint definition exists in the DDL but is not enforced. A bug in the ELT logic inserts duplicate customer records. Row count parity checks pass — the expected number of records is there. It's just that some of them are duplicates.

4. INTERVAL and PERIOD Columns That Convert to VARCHAR

Redshift INTERVAL and Teradata PERIOD columns convert to VARCHAR in Snowflake. The values load. But any downstream transformation or report that performs arithmetic on these columns silently fails — VARCHAR doesn't support date arithmetic. The error surfaces weeks later when an analyst tries to calculate time differences.

Automating Snowflake Migration Validation

The validation checks described in this guide — across three phases, covering data types, timestamps, row counts, transformations, referential integrity, and BI output — are too extensive to run manually at enterprise scale. A typical data warehouse migration involves hundreds of tables, billions of rows, and dozens of transformation rules.

Automated validation platforms can execute this entire checklist — including Snowflake-specific checks — across the full dataset, not a sample, producing a structured validation report after every load wave. When a mapping error is identified and corrected, the suite re-runs and confirms the fix without requiring another manual test cycle.

Vexdata's platform handles Snowflake migration validation end to end — from pre-migration source profiling and data type compatibility checks, through wave-by-wave reconciliation, to cutover sign-off reporting. It connects directly to your source system and your Snowflake target and compares them at row and field level, automatically.

Ready to Validate Your Snowflake Migration?

If you're planning a migration to Snowflake — from Teradata, Redshift, Cloudera, or another platform — the validation framework in this guide gives you the testing structure to do it correctly. Work through each phase. Don't skip the data type audit. Run timestamp checks with date-partitioned aggregates, not string equality.

And if you want to see automated Snowflake migration validation in practice, we've also published a detailed walkthrough of how Vexdata handles Cloudera-to-Snowflake validation specifically — with the exact checks that surface the issues described in this post.

→ Read: Making Cloudera to Snowflake Migration Validation Effortless with Vexdata

vexdata.io/post/making-cloudera-to-snowflake-migration-validation-effortless-with-vexdata

→ See the platform: vexdata.io/data-migration | → Book a demo: vexdata.io/contact

Snowflake Migration Validation: What to Test Before, During, and After Cutover

Recent Posts

Comments

Data Ingestion Validation

Data Transformation Validation

Data Lake Testing

Date Warehouse Migration

Data Migration Testing

BI Tool Testing

Cloud Migration Testing

CRM Tool Migration Testing

Flat File Testing

Document Validation

Data Validation

Data Migration Testing

Data Quality and Cleansing

Data Observability

Designed by DataDrivify