Skip to content

Welcome to Veridelta

Semantic diffing for mission-critical data pipelines.

Veridelta is a high-performance data comparison engine designed to validate changes between datasets. Built on top of Polars, it allows you to define explicit rules for expected variance, like floating-point jitter or casing differences, so you can ignore the noise and focus on real regressions.


Key Features

  • Blazing Fast: Powered by a Rust-backed Polars engine for massive dataset handling.
  • Declarative Rules: Define tolerances, string normalization, and null handling in simple YAML.
  • Dual-Entry: Use it as a CLI tool in CI/CD pipelines or as a Python library in Airflow/Notebooks.
  • Flexible Schema Modes: Handle evolving datasets with support for additions, removals, and strict matching.

Installation

Install via uv (recommended):

uv add veridelta

Or using pip:

pip install veridelta

Quick Start

  1. Define your rules in a veridelta.yaml file.
  2. Run the comparison: bash veridelta run --config veridelta.yaml
  3. Review the summary in your terminal or check the output_path for detailed Parquet diffs.

Explore the Docs

  • Configuration Guide: Learn how to write rules for tolerances, strings, and schemas.
  • API Reference: Detailed documentation of the Python classes and methods.
  • Roadmap: Discover upcoming features like warehouse pushdown, Lakehouse support, and advanced ML heuristics.