Configuration Guide

Veridelta is driven by a configuration object (or YAML file) that defines how two datasets should be aligned and compared.

Core Settings

The following fields are required for every comparison:

Field	Description
`primary_keys`	A list of columns used to join and align the datasets (e.g., `['id']`).
`source`	Configuration for the "Legacy" or "Left" dataset.
`target`	Configuration for the "Modern" or "Right" dataset.

Schema Modes

The schema_mode determines how Veridelta handles columns that don't match between datasets.

intersection (Default): Only compare columns present in both datasets.
exact: Fail if columns or their order do not match perfectly.
allow_additions: Allow the Target to have new columns not found in the Source.
allow_removals: Allow the Target to drop columns found in the Source.

Column Rules

Rules allow you to define tolerances for specific columns or patterns.

Numeric Tolerances

Use these to ignore floating-point jitter in financial or scientific data.

rules:
  - column_names: ["total_amount"]
    absolute_tolerance: 0.01
    relative_tolerance: 0.005

String Normalization

Handle messy text data by cleaning it before the comparison.

rules:
  - column_names: ["user_email"]
    case_insensitive: true
    whitespace_mode: "both"
    regex_replace:
      "\\.com$": ".net"  # Example regex sanitization

Value Mapping (Crosswalks)

Translate legacy Enums to modern values.

rules:
  - column_names: ["status_code"]
    value_map:
      "0": "INACTIVE"
      "1": "ACTIVE"