Welcome to Veridelta

Semantic diffing for mission-critical data pipelines.

Veridelta is a high-performance data comparison engine designed to validate changes between datasets. Built on top of Polars, it allows you to define explicit rules for expected variance, like floating-point jitter or casing differences, so you can ignore the noise and focus on real regressions.

Key Features

Blazing Fast: Powered by a Rust-backed Polars engine for massive dataset handling.
Declarative Rules: Define tolerances, string normalization, and null handling in simple YAML.
Dual-Entry: Use it as a CLI tool in CI/CD pipelines or as a Python library in Airflow/Notebooks.
Flexible Schema Modes: Handle evolving datasets with support for additions, removals, and strict matching.

Installation

Install via uv (recommended):

uv add veridelta

Or using pip:

pip install veridelta

Quick Start

Define your rules in a veridelta.yaml file.
Run the comparison: bash veridelta run --config veridelta.yaml
Review the summary in your terminal or check the output_path for detailed Parquet diffs.

Explore the Docs

Configuration Guide: Learn how to write rules for tolerances, strings, and schemas.
API Reference: Detailed documentation of the Python classes and methods.
Roadmap: Discover upcoming features like warehouse pushdown, Lakehouse support, and advanced ML heuristics.