Veridelta Roadmap
Veridelta is currently in v0.1.0 (Alpha). The core semantic diffing engine is stable, but we are actively expanding the ecosystem. Here is what is coming next:
1. Expanded Ecosystem Support
While CSV and Parquet cover the majority of file-based workflows, enterprise pipelines often operate directly on data warehouses and lakehouses. * Native integration for Delta Lake and Apache Iceberg tables. * Direct SQL pushdown for Snowflake and Databricks to diff massive datasets without pulling them into memory.
2. Advanced Heuristics
- Fuzzy String Matching: Support for Levenshtein distance thresholds to catch minor typos without explicit regex.
- Schema Evolution ML: Auto-suggest
value_mapdictionaries based on statistical sampling (e.g., auto-detecting that 'M' maps to 'Male' 99% of the time).
3. CI/CD & Reporting
- GitHub Actions App: A native GitHub Action that comments on PRs with a mini Veridelta diff summary when pipeline code changes.
- HTML Dashboards: An optional
--htmlflag in the CLI to generate a standalone, interactive web report of the diff results.