NickCrews / mismo
The SQL/Ibis powered sklearn of record linkage
☆15Updated 2 weeks ago
Alternatives and similar repositories for mismo:
Users that are interested in mismo are comparing it to the libraries listed below
- Blocking records for record linkage and data deduplication based on ANN algorithms in Python.☆13Updated this week
- Fast, accurate, open-source geocoding in Python☆35Updated 2 weeks ago
- A maximum-strength name parser for record linkage.☆37Updated last week
- Jupyter Cell / Line Magics for DuckDB☆48Updated 3 months ago
- Linear regression in SQL using dbt☆70Updated 3 months ago
- A serverless duckDB deployment at GCP☆39Updated 2 years ago
- ☆18Updated last year
- pseudopeople is a Python package that generates realistic simulated data about a fictional United States population, designed for use in …☆21Updated this week
- Ibis analytics, with Ibis (and more!)☆21Updated 7 months ago
- Write your dbt models using Ibis☆65Updated last month
- List of entity resolution software and resources.☆65Updated 2 months ago
- This repo contains information about DuckDB extensions found on GitHub. Refreshed daily☆97Updated this week
- DuckDB API integrations☆31Updated 2 months ago
- Interactive notebooks containing demonstration code of the splink library☆38Updated last year
- A Singer.io target for DuckDB☆17Updated 2 months ago
- ☆16Updated last month
- A browser user interface for manual labeling of record pairs.☆47Updated last year
- DuckDB API Server with Arrow Flight SQL Airport support and concurrent writes/reads (quackpipe)☆73Updated 2 months ago
- Full spreadsheet-style pivot table through SQL macros. Just specify values, rows, columns, and filters!☆13Updated 7 months ago
- Examples for the MotherDuck WASM Client library, enabling MotherDuck integration for WebAssembly-powered DuckDB☆56Updated 3 months ago
- A repository of runnable examples using ibis☆43Updated 10 months ago
- ☆31Updated 2 months ago
- An experimental Athena extension for DuckDB 🐤☆54Updated 4 months ago
- pyspark-parallelised functions producing graph-theoretical metrics in connected component clusters for use in record-linkage (or other do…☆10Updated last year
- The Modern Data Stack in a (Smaller) Box☆12Updated 2 years ago
- A duckDB extension implementing a full service AI/ML enginer☆26Updated 6 months ago
- A piped SQL for DuckDB☆80Updated last month
- A high-performance data streaming system using DuckDB and Apache Arrow Flight.☆77Updated 2 months ago
- DuckDB extension to read files within zip archives.☆31Updated 2 weeks ago
- ☆90Updated last year