moj-analytical-services / splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
☆1,559Updated last week
Alternatives and similar repositories for splink:
Users that are interested in splink are comparing it to the libraries listed below
- Scalable identity resolution, entity resolution, data mastering and deduplication using ML☆1,015Updated this week
- dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)☆1,056Updated this week
- Scalable and efficient data transformation framework - backwards compatible with dbt.☆2,254Updated this week
- A powerful and modular toolkit for record linkage and duplicate detection in Python☆998Updated last year
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rew…☆2,072Updated 3 weeks ago
- Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io☆2,067Updated this week
- Port(ish) of Great Expectations to dbt test macros☆1,158Updated 4 months ago
- do more with dbt. dbt-fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning m…☆850Updated last year
- re_data - fix data issues before your users & CEO would discover them 😊☆1,562Updated 11 months ago
- data load tool (dlt) is an open source Python library that makes data loading easy 🛠️☆3,490Updated this week
- Dagster Labs' open-source data platform, built with Dagster.☆344Updated this week
- Malloy is an experimental language for describing data relationships and transformations.☆2,124Updated this week
- Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metada…☆2,107Updated 2 weeks ago
- A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton☆861Updated last year
- Repository for the ActivitySchema spec and supporting materials☆416Updated 2 years ago
- Turning PySpark Into a Universal DataFrame API☆385Updated last week
- Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to wr…☆2,032Updated this week
- ☆363Updated last year
- What's in your data? Extract schema, statistics and entities from datasets☆1,477Updated last month
- List of `pre-commit` hooks to ensure the quality of your `dbt` projects.☆641Updated last week
- A list of free data matching and record linkage software.