anhaidgroup / py_stringsimjoin
Scalable String Similarity Joins in Python
☆39Updated 9 months ago
Alternatives and similar repositories for py_stringsimjoin:
Users that are interested in py_stringsimjoin are comparing it to the libraries listed below
- A comprehensive and scalable set of string tokenizers and similarity measures in Python☆137Updated 8 months ago
- Hidden alignment conditional random field for classifying string pairs.☆24Updated 6 months ago
- A Cython implementation of the affine gap string distance☆57Updated 2 years ago
- ☆30Updated 2 years ago
- Algorithms for "schema matching"☆26Updated 8 years ago
- 🧬 A VS Code extension for annotating data with Prodigy☆30Updated 3 years ago
- A maximum-strength name parser for record linkage.☆36Updated last week
- Python wrapper for a C++ Double Metaphone☆15Updated 2 years ago
- ☆189Updated 10 months ago
- Python package for deduplication/entity resolution using active learning☆78Updated 7 months ago
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- python package for performing deduplication using flexible text matching and cleaning in pandas dataframe☆25Updated 4 years ago
- A browser user interface for manual labeling of record pairs.☆46Updated last year
- Python package aiding in entity disambiguation based on string and location matching☆18Updated last year
- A Python package for efficient evaluation based on OASIS (Optimal Asymptotic Sequential Importance Sampling).☆15Updated 3 years ago
- A disk-based key/value store in Python with no dependencies.☆21Updated 9 years ago
- python3 package supporting efficient storage and querying of sets of sets using the trie data structure. Supports finding all the superse…☆23Updated last year
- Tutorial code and data for the entity resolution workshops.☆45Updated 9 years ago
- Graph Engine for Exploration and Search☆40Updated last year
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆62Updated last week
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- A repository for the "Combining DBpedia and Topic Modeling" GSoC 2016 idea☆13Updated 8 years ago
- Generic Environment for Context-Aware Correction of Orthography☆22Updated 2 years ago
- Set-oriented Operations in Pandas☆24Updated 4 years ago
- Notebooks configured to be run with Binder, usually found on my blog.☆42Updated 2 years ago
- ALMa (Active Learning Manager) Keeps track of labeled and unlabeled data for active learning☆41Updated 4 years ago
- An index data structure for approximate string search.☆23Updated 5 years ago
- Search 'from' and 'to' strings to learn a text cleaning mapping☆17Updated 9 years ago
- ☆10Updated 4 years ago
- ☄️ Parallel and distributed training with spaCy and Ray☆53Updated last year