Scalable String Similarity Joins in Python
☆39Jul 12, 2024Updated last year
Alternatives and similar repositories for py_stringsimjoin
Users that are interested in py_stringsimjoin are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A comprehensive and scalable set of string tokenizers and similarity measures in Python☆143Feb 18, 2026Updated last month
- ☆192May 29, 2024Updated last year
- Python package for performing Entity and Text Matching using Deep Learning.☆615Jun 18, 2024Updated last year
- Implementation of many similarity join algorithms.☆15Mar 6, 2014Updated 12 years ago
- A browser user interface for manual labeling of record pairs.☆48Jun 23, 2023Updated 2 years ago
- Implementation of Shake-Shake by chainer (Shake-Shake regularization of 3-branch residual networks: https://openreview.net/forum?id=HkO-P…☆10Aug 24, 2017Updated 8 years ago
- Asynchronous financial data management☆21Oct 3, 2017Updated 8 years ago
- Learning String Alignments for Entity Aliases☆37Mar 21, 2019Updated 7 years ago
- Code for extracting data from a large number of PDFs, particularly FCC political ad documents☆15Oct 26, 2017Updated 8 years ago
- Build-to-Order BLAS☆12Apr 9, 2019Updated 6 years ago
- ☆14Feb 1, 2024Updated 2 years ago
- ☆16Jan 7, 2021Updated 5 years ago
- ☆15Apr 6, 2018Updated 7 years ago
- Simple approximate-nearest-neighbours in Python using locality sensitive hashing.☆141Jun 21, 2012Updated 13 years ago
- ☆14Dec 27, 2022Updated 3 years ago
- Constrained episodic reinforcement learning in concave-convex and knapsack settings☆11Oct 3, 2023Updated 2 years ago
- A script to generate tagged XML Citationstrings for citation parsing☆20Apr 17, 2020Updated 5 years ago
- Utility for working with DOSDP design patterns and OWL ontologies☆29Dec 10, 2025Updated 3 months ago
- Levenshtein distance between two strings in julia☆14May 15, 2019Updated 6 years ago
- Dyna built on R-exprs (First Prototype)☆17Mar 7, 2022Updated 4 years ago
- Daily refreshed data on representation certification and unfair labor cases from nlrb.gov☆21Nov 13, 2025Updated 4 months ago
- python3 package supporting efficient storage and querying of sets of sets using the trie data structure. Supports finding all the superse…☆23Sep 15, 2023Updated 2 years ago
- Visualizations of character embeddings from derived character vectors.☆13Apr 4, 2017Updated 8 years ago
- Geopandas and Shapely☆10Jul 29, 2018Updated 7 years ago
- A list of free data matching and record linkage software.☆400Feb 21, 2024Updated 2 years ago
- ☆25Aug 20, 2025Updated 7 months ago
- A powerful and modular toolkit for record linkage and duplicate detection in Python☆1,048Feb 21, 2024Updated 2 years ago
- Enrich sf data with geographic features from OpenStreetMaps.☆19Dec 21, 2021Updated 4 years ago
- A collection of Python scripts☆12Feb 7, 2020Updated 6 years ago
- Command line utility for d3-pre pre-rendering pipeline☆13Jul 14, 2016Updated 9 years ago
- A ChatGPT plugin for Solana☆13Jun 1, 2023Updated 2 years ago
- This repo contains my part of the code for our winning entry in the TensorFlow Speech Recognition Challenge hosted by kaggle☆19Aug 27, 2018Updated 7 years ago
- Entitypedia is an Extended Named Entity Dictionary from Wikipedia.☆13Dec 7, 2022Updated 3 years ago
- A bunch of tools for automating parts of a Systematic Review of scientific literature☆14Sep 16, 2020Updated 5 years ago
- Bart vs. Homer recognition task to spot and fix data leakage.☆25Nov 22, 2018Updated 7 years ago
- A reference implementation of algorithms for distributions over spanning trees.☆21Mar 10, 2020Updated 6 years ago
- Make your research data and code FAIR with the UU FAIR Cheatsheets!☆16Apr 10, 2024Updated last year
- A curated list of awesome Citizen Science Projects in the Netherlands☆20May 4, 2021Updated 4 years ago
- Generate colour palettes from Rijksmuseum paintings☆15May 4, 2021Updated 4 years ago