moj-analytical-services / splink_graph
pyspark-parallelised functions producing graph-theoretical metrics in connected component clusters for use in record-linkage (or other domains)
☆10Updated last year
Alternatives and similar repositories for splink_graph:
Users that are interested in splink_graph are comparing it to the libraries listed below
- A dbt package designed to help SQL based analysis of graphs☆20Updated last year
- This project is wraper for Leilex, legal entity identifier API. Includes ISIN-LEI conversion. Search LEI number using company name.☆24Updated 6 months ago
- The SQL/Ibis powered sklearn of record linkage☆15Updated this week
- ☆18Updated last year
- Interactive notebooks containing demonstration code of the splink library☆38Updated last year
- An End-to-End Evaluation Framework for Entity Resolution Systems☆27Updated last year
- Clustering and Link Prediction Evaluation in R☆12Updated last year
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- dbt package mimicking dplyr select-helpers semantics☆140Updated 8 months ago
- Fully unit tested utility functions for data engineering. Python 3 only.☆16Updated 8 months ago
- dbt-generator - Generate and transform base models for dbt project☆46Updated 2 years ago
- Performs unique entity estimation corresponding to Chen, Shrivastava, Steorts (2018).☆14Updated 6 years ago
- Ibis analytics, with Ibis (and more!)☆21Updated 7 months ago
- [DEPRECATED] A dbt adapter for Excel.☆92Updated 2 weeks ago
- An experimental Athena extension for DuckDB 🐤☆54Updated 3 months ago
- Collect and combine data for analysis of the Medicare Advantage market from 2008 through 2015.☆10Updated 2 years ago
- ☆12Updated 4 years ago
- Perform Bayesian record linkage with a one-to-one matching assumption.☆11Updated 4 years ago
- ☆14Updated this week
- Linear regression in SQL using dbt☆70Updated 3 months ago
- 📦 Example repository showing how to use dbt inside Visual Studio Code development containers☆41Updated 4 months ago
- A python package to create a database on the platform using our moj data warehousing framework☆21Updated 7 months ago
- dbt starter code for enterprise Snowflake usage data artifacts☆22Updated 2 years ago
- A serverless duckDB deployment at GCP☆39Updated 2 years ago
- Blocking records for record linkage and data deduplication based on ANN algorithms in Python.☆12Updated this week
- Efficient String Comparison Functions and Fuzzy String Matching☆17Updated 3 years ago
- A repository for the best data content, from data science to data engineering☆21Updated 2 months ago
- MOVED TO GITLAB. A list/directory of awesome/helpful Looker and LookML work.☆19Updated 3 years ago
- A maximum-strength name parser for record linkage.☆36Updated 3 weeks ago
- ☆15Updated 2 years ago