top of page

Snowflake Migration Validation: What to Test Before, During, and After Cutover

  • Writer: Vexdata
    Vexdata
  • 7 hours ago
  • 11 min read

Migrating to Snowflake is the most common data warehouse migration happening right now. Teradata to Snowflake. Redshift to Snowflake. Cloudera / Hadoop to Snowflake. Oracle to Snowflake. The reasons are well understood: elastic compute, consumption-based pricing, separation of storage and compute, and native support for semi-structured data at scale.


What's less discussed is why so many of these migrations pass technical validation and still fail in production. The pipeline completes. Snowflake shows the tables loaded. No errors in the job logs. And two weeks later, a BI analyst notices that revenue figures for Q3 are off by 3% — and nobody can immediately explain why.


The answer is almost always one of three things: a data type that behaves differently in Snowflake than it did in the source system, a timestamp that silently shifted due to timezone handling, or a transformation rule that was applied in the source ETL and wasn't carried over to the new ELT pipeline.


This guide is written for data engineers and data platform teams running Snowflake migrations. It covers the Snowflake-specific validation checks that generic migration guides don't — the data type traps, the timestamp edge cases, the NULL semantics differences, and the cutover checks that actually matter when you're under go-live pressure.


"Every failed Snowflake migration I've rescued had the same root cause: validation treated as a checkbox, not a phase." — VP of Data Engineering (cited in Aegis Softtech migration guide)


Why Snowflake Migrations Need Snowflake-Specific Validation


Generic migration testing advice — profile your source, check row counts, validate transformations — applies to every migration. But Snowflake introduces a set of platform-specific behaviours that your validation suite must account for explicitly. If you're treating a Snowflake migration like any other database-to-database move, these will catch you.

Case Sensitivity


Teradata is case-insensitive by default. Snowflake is case-sensitive. The string values 'Glass', 'GLASS', and 'glass' are three distinct values in Snowflake. If your source system stored values inconsistently — some uppercase, some mixed — and your queries relied on case-insensitive comparisons, those queries will return incorrect results in Snowflake without throwing any errors.


⚠  Validation requirement: For every string column used in a join or filter condition, check for case inconsistency in source data and confirm Snowflake queries handle it explicitly with UPPER() or LOWER().


Primary Key and Foreign Key Enforcement


In Teradata, Oracle, and most traditional warehouses, primary key and foreign key constraints are enforced — the database rejects records that violate them. In Snowflake, these constraints are defined but not enforced. Snowflake stores the constraint metadata for query optimisation hints but does not prevent duplicate primary keys or orphan foreign keys from being inserted.

This means that ETL load processes which relied on constraint enforcement to prevent bad data from entering the source warehouse will not get the same protection in Snowflake. Duplicate records and orphan foreign keys that the source system would have rejected will load successfully into Snowflake and quietly corrupt your data model.


⚠  Validation requirement: Run explicit duplicate key checks and referential integrity checks on every loaded table. Do not assume constraint definitions in the DDL provide any enforcement.


NULL Semantics

Snowflake's NULL handling differs subtly from some source platforms. In Teradata, COALESCE() with mixed numeric and string parameters converts numerics to string. In Snowflake, it converts strings to numeric — which can cause unexpected type coercion errors or silent value changes in columns that mix types.


The Data Type Mapping Table: Where Migrations Break Silently

This is the section most migration teams skip. Data type differences between source platforms and Snowflake are the primary source of silent data corruption — values that load without errors but arrive with truncated precision, incorrect formatting, or fundamentally different behaviour.

The table below covers the most common and most dangerous mappings across Teradata, Redshift, and Cloudera/Hive:


Source Type

Snowflake Type

Validation Risk

TD BYTEINT

SMALLINT

Value range change: BYTEINT is 0–255; SMALLINT is −32,768 to 32,767. Check for downstream logic that assumed 0–255 boundary.

TD PERIOD(DATE)

VARCHAR (2 cols)

Snowflake has no PERIOD type. Columns must be split into start/end DATE columns. Any code using PERIOD functions requires rewrite.

TD TIMESTAMP(6)

TIMESTAMP_NTZ(9)

Teradata stores 6 fractional digits; Snowflake default is 9. Precision mismatch can break exact equality comparisons.

RS INTERVAL DAY TO SECOND

VARCHAR

Redshift INTERVAL types have no Snowflake equivalent — converted to VARCHAR. Any arithmetic using these fields breaks silently.

RS HLLSKETCH

Not supported

Redshift HyperLogLog sketch type has no Snowflake equivalent. Columns using this type require a complete redesign approach.

RS BPCHAR

VARCHAR

Redshift fixed-length CHAR converted to variable-length VARCHAR. Trailing space behaviour changes — affects string comparison logic.

Hive STRING

VARCHAR(16777216)

Hive STRING maps to Snowflake max VARCHAR. Existing length constraints are lost. Downstream column length assumptions break.

Hive TINYINT

SMALLINT

Range expansion: TINYINT is −128 to 127; SMALLINT is −32,768 to 32,767. Values load correctly but type contracts change.

Oracle NUMBER(p,s)

NUMBER(p,s)

Generally safe but verify precision/scale explicitly. Oracle NUMBER with no precision defaults to 38,127 in Snowflake — can cause unexpected rounding.

SQL Server DATETIME2

TIMESTAMP_NTZ

Timezone-naive mapping is correct but fractional second precision must match. Verify microsecond values are preserved.


Before migration begins: Run a full data type audit of your source schema. Flag every column using a type in this table and define the explicit validation rule for each one before a single row moves.


The Timestamp Problem: Snowflake's Most Common Silent Failure


Timestamp handling is the single most common source of post-migration data discrepancies in Snowflake migrations. It deserves its own section.


Snowflake has three timestamp types with distinct behaviour:


TIMESTAMP_NTZ (no timezone) — stores the literal datetime value with no timezone context. The most common default.


TIMESTAMP_TZ (with timezone) — stores the UTC value plus a timezone offset. Values from different timezones are stored with their offset.


TIMESTAMP_LTZ (local timezone) — stores in UTC and converts to the session timezone on display. Output changes based on session context.


When migrating from a source system that stored timestamps in local time — common in legacy Teradata and Oracle warehouses — loading into TIMESTAMP_NTZ without timezone conversion means your timestamps are technically correct but represent the wrong moment in time relative to UTC.


This doesn't produce an error. The values load. Your row counts match. Your field-level comparison passes if you're comparing string representations. But any analysis that groups by date or calculates time differences may be off by hours — silently.


⚠  Test required: For every timestamp column in the migration, compare date-partitioned aggregates between source and target. Do not rely on string equality checks for timestamp validation.


Snowflake Timestamp Validation Queries

Use these during migration testing to catch timezone drift:


-- Check for timezone offset differences in timestamp values

SELECT DATE_TRUNC('day', source_ts) AS source_day,

       DATE_TRUNC('day', target_ts) AS target_day,

       COUNT(*) AS mismatch_count

FROM source_target_comparison

WHERE source_day <> target_day

GROUP BY 1, 2;


-- Verify TIMESTAMP_NTZ vs TIMESTAMP_LTZ conversion

SELECT CONVERT_TIMEZONE('UTC', 'America/New_York', your_ts_column)

FROM your_table LIMIT 100;


  Phase 1    Pre-Migration  —  Before Any Data Moves to Snowflake

Pre-migration validation for Snowflake goes beyond generic source profiling. The goal is to identify every data characteristic that will behave differently in Snowflake before the migration begins.


Source Schema Audit for Snowflake Compatibility


Validation Check

Applies To

Priority

☐  Inventory all PERIOD, INTERVAL, and platform-specific data types — map each to Snowflake equivalent

All sources

Critical

☐  Profile all VARCHAR columns for actual max length — confirm target column definitions are wide enough

All sources

Critical

☐  Audit all TIMESTAMP columns — document timezone context for each (local, UTC, or naive)

All sources

Critical

☐  Check for CHAR vs VARCHAR columns — document trailing space behaviour in source system

Teradata / Oracle

High

☐  Identify all columns used as join keys — flag string keys that may have case inconsistencies

Teradata

Critical

☐  List all PRIMARY KEY and FOREIGN KEY constraints — document enforcement behaviour in source

All sources

Critical

☐  Audit stored procedures, macros, and UDFs that reference platform-specific functions

Teradata / Oracle

High

☐  Identify Redshift HLLSKETCH or SUPER columns that have no Snowflake equivalent

Redshift

High

☐  Document all columns using INTERVAL types in Redshift — confirm VARCHAR conversion strategy

Redshift

Critical

☐  Run null rate profiling on all columns — flag NOT NULL columns with nulls in source data

All sources

High


Snowflake Target Environment Validation


Validation Check

Applies To

Priority

☐  Confirm DDL for all target tables matches source-to-target mapping document

Data Engineer

Critical

☐  Validate Snowflake database, schema, and table naming conventions applied consistently

Data Engineer

Medium

☐  Confirm TIMESTAMP column types explicitly defined (NTZ / TZ / LTZ) — not relying on session defaults

Data Engineer

Critical

☐  Verify STAGE configurations for source file format (delimiter, encoding, date format)

Data Engineer

Critical

☐  Confirm role-based access controls (RBAC) applied — source system permissions do not auto-migrate

Data Engineer

High

☐  Test sample COPY INTO command with 100-row file before full load

Data Engineer

High


  Phase 2    During Migration  —  Validating Each Load Wave

Snowflake migrations typically load in waves — dimension tables, fact tables, historical archives. Each wave requires an immediate validation pass. The checks below are specific to Snowflake's loading behaviour and should run against every table loaded.


Row Count and Load Completeness


Validation Check

Applies To

Priority

☐  Row count in Snowflake table matches source system record count exactly

Data Engineer

Critical

☐  Check Snowflake COPY INTO error logs — confirm zero rejected records

Data Engineer

Critical

☐  Verify VARIANT/semi-structured column data loaded without truncation

Data Engineer

High

☐  For incremental loads: confirm delta records identified and loaded correctly

Data Engineer

Critical

☐  Validate file staging completeness — all source files copied to STAGE before load

Data Engineer

High


Snowflake-Specific Data Validation


Validation Check

Applies To

Priority

☐  String column case consistency check — run UPPER(col) = col checks on join key columns

Data Engineer

Critical

☐  Duplicate primary key check — Snowflake will not reject these

Data Engineer

Critical

☐  Referential integrity check — foreign keys resolve to parent table records

Data Engineer

Critical

☐  TIMESTAMP column date-partitioned aggregate comparison vs source

Data Engineer

Critical

☐  Numeric precision spot check — verify decimal values match to full precision

Data Engineer

High

☐  NULL value distribution matches source profile — no unexpected null inflation

Data Engineer

High

☐  VARIANT column query test — confirm JSON/semi-structured data queryable as expected

Data Engineer

High

☐  String length distribution check — confirm no silent truncation occurred at COPY INTO

Data Engineer

High


Transformation Validation


Validation Check

Applies To

Priority

☐  All ELT transformation logic produces identical output to source ETL for sample record set

Data Engineer

Critical

☐  Aggregated metrics (SUM, COUNT, AVG) match source system by key dimensions

Data Engineer

Critical

☐  Derived date fields calculated correctly using Snowflake date functions

Data Engineer

High

☐  COALESCE logic produces expected output — verify mixed-type parameter handling

Data Engineer

High

☐  Any PERIOD column splitting applied correctly — start_date and end_date values accurate

Data Engineer

Critical

☐  INTERVAL column VARCHAR conversions validated — arithmetic using these fields works as expected

Data Engineer

High

☐  Business rule validation: calculated KPIs match source system reports for same time period

QA Lead

Critical


  Phase 3    Cutover Validation  —  The Go / No-Go Gate

Cutover validation for a Snowflake migration is the final systematic check before production traffic switches to Snowflake. It must be automated, fast, and produce a clear go/no-go output. The checks below are your cutover gate requirements.


Validation Check

Applies To

Priority

☐  Final row count parity confirmed across all migrated tables

Data Engineer

Critical

☐  Automated reconciliation report generated — aggregates match by key dimensions

Data Engineer

Critical

☐  Full duplicate primary key scan across all fact and dimension tables

Data Engineer

Critical

☐  Full referential integrity check — zero orphan records

Data Engineer

Critical

☐  Timestamp comparison: date-grouped aggregates match source for all TIMESTAMP columns

Data Engineer

Critical

☐  Top 20 production queries tested on Snowflake — output diff against source system

Data Engineer

Critical

☐  BI dashboards validated — key metrics match source system reports for same date range

QA Lead

Critical

☐  ELT pipeline connectivity confirmed — all scheduled jobs point to Snowflake

Data Engineer

High

☐  Connection strings updated in all consuming applications and APIs

Data Engineer

High

☐  Snowflake virtual warehouse sizing confirmed for expected production load

Data Engineer

High

☐  RBAC access tested — each user role confirmed to have correct data access in Snowflake

QA Lead

High

☐  Rollback plan confirmed — can revert to source system within defined window if needed

Data Engineer

Critical

☐  Signed go/no-go document completed by Data Engineering Lead and Business Owner

QA Lead

Critical


Cutover timing: Run the automated validation suite immediately before cutover, not hours earlier. Any delta load between validation and cutover is a window for the source to drift from the target.


Post-Cutover: The 30-Day Monitoring Window


The migration is live. This is not when validation stops — it's when a different kind of validation begins. Production usage surfaces issues that test environments never encounter: edge cases in real user data, unexpected query patterns, and incremental load failures that only appear at production volume.


Run these checks for the first 30 days post-cutover:

  • Daily row count monitoring — 

        Alert on any table where daily record counts deviate more than 5% from the rolling 7-day average without a corresponding source explanation.

  • Incremental load validation — 

        After every scheduled ELT run, confirm new records loaded match expected volume from source system change feed.

  • Weekly BI reconciliation — 

        For the first four weeks, run key financial and operational reports against both source and


Snowflake and compare outputs. This is your final safety net.

  • Query performance baseline — 

        Use Snowflake's Query Profile to identify full table scans and spilling-to-disk operations.


Snowflake behaves differently from row-store sources — clustering keys and materialized views may be needed.

  • User-reported discrepancy triage — 

        Establish a formal channel for data consumers to flag numbers that look wrong. Most real-world data issues are spotted by analysts, not automated checks.


The 4 Most Common Snowflake Migration Validation Failures


1.  Timestamps That Look Right But Aren't

The most frequent post-migration complaint: "The numbers are off by a small percentage." Root cause, 80% of the time: timestamp timezone handling. Local-time timestamps loaded into TIMESTAMP_NTZ shift date boundaries when the source system was in a non-UTC timezone. Transactions that happened at 11 PM Eastern become midnight UTC — crossing day boundaries and shifting daily aggregates.


2.  Case-Sensitive Join Failures

JOIN conditions that worked in Teradata because both sides happened to be uppercase fail silently in Snowflake. They don't fail with an error — they return zero rows for records where the case doesn't match. The table loads successfully. The join produces an empty result set that looks like data doesn't exist.


3.  Missing Constraint Enforcement Allows Duplicates

In the source warehouse, a unique constraint on a customer ID column prevented duplicate records from being inserted. In Snowflake, the same constraint definition exists in the DDL but is not enforced. A bug in the ELT logic inserts duplicate customer records. Row count parity checks pass — the expected number of records is there. It's just that some of them are duplicates.


4.  INTERVAL and PERIOD Columns That Convert to VARCHAR

Redshift INTERVAL and Teradata PERIOD columns convert to VARCHAR in Snowflake. The values load. But any downstream transformation or report that performs arithmetic on these columns silently fails — VARCHAR doesn't support date arithmetic. The error surfaces weeks later when an analyst tries to calculate time differences.


Automating Snowflake Migration Validation


The validation checks described in this guide — across three phases, covering data types, timestamps, row counts, transformations, referential integrity, and BI output — are too extensive to run manually at enterprise scale. A typical data warehouse migration involves hundreds of tables, billions of rows, and dozens of transformation rules.


Automated validation platforms can execute this entire checklist — including Snowflake-specific checks — across the full dataset, not a sample, producing a structured validation report after every load wave. When a mapping error is identified and corrected, the suite re-runs and confirms the fix without requiring another manual test cycle.


Vexdata's platform handles Snowflake migration validation end to end — from pre-migration source profiling and data type compatibility checks, through wave-by-wave reconciliation, to cutover sign-off reporting. It connects directly to your source system and your Snowflake target and compares them at row and field level, automatically.


Ready to Validate Your Snowflake Migration?


If you're planning a migration to Snowflake — from Teradata, Redshift, Cloudera, or another platform — the validation framework in this guide gives you the testing structure to do it correctly. Work through each phase. Don't skip the data type audit. Run timestamp checks with date-partitioned aggregates, not string equality.


And if you want to see automated Snowflake migration validation in practice, we've also published a detailed walkthrough of how Vexdata handles Cloudera-to-Snowflake validation specifically — with the exact checks that surface the issues described in this post.


→  Read: Making Cloudera to Snowflake Migration Validation Effortless with Vexdata

→  See the platform: vexdata.io/data-migration     |     →  Book a demo: vexdata.io/contact

 
 
 

Comments


bottom of page