dedupeio / dedupeLinks
A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
☆4,322Updated 7 months ago
Alternatives and similar repositories for dedupe
Users that are interested in dedupe are comparing it to the libraries listed below
Sorting:
- Examples for using the dedupe library☆413Updated 10 months ago
- A powerful and modular toolkit for record linkage and duplicate detection in Python☆1,011Updated last year
- Command line tool for deduplicating CSV files☆423Updated 5 years ago
- a python library for parsing unstructured United States address strings into address components☆1,577Updated 2 weeks ago
- Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends☆1,629Updated this week
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,141Updated last week
- NLP, before and after spaCy☆2,226Updated last year
- a python library for parsing unstructured western names into name components.☆606Updated last month
- Python bindings to libpostal for fast international address parsing/normalization☆820Updated 4 months ago
- Python implementation of TextRank algorithms ("textgraphs") for phrase extraction☆2,181Updated 2 weeks ago
- A toolkit for making domain-specific probabilistic parsers☆803Updated 9 months ago
- A list of free data matching and record linkage software.☆385Updated last year
- Python library for interactive topic model visualization. Port of the R LDAvis package.☆1,835Updated 11 months ago
- Utils for streaming large files (S3, HDFS, gzip, bz2...)☆3,328Updated 2 weeks ago
- Multilingual text (NLP) processing toolkit☆2,345Updated last year
- the portable Python dataframe library☆5,878Updated this week
- 📚 Parameterize, execute, and analyze notebooks☆6,201Updated 2 months ago
- A system for quickly generating training data with weak supervision☆5,873Updated last year
- Voilà turns Jupyter notebooks into standalone web applications☆5,741Updated 3 weeks ago
- extract text from any document. no muss. no fuss.☆4,174Updated 6 months ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…☆8,399Updated 8 months ago
- Parallel computing with task scheduling☆13,291Updated last week
- Quilt is a data mesh for connecting people with actionable data☆1,341Updated this week
- Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark☆1,514Updated 6 months ago
- Modin: Scale your Pandas workflows by changing a single line of code☆10,197Updated this week
- Python Extract Transform and Load Tables of Data☆1,273Updated last month
- Beautiful visualizations of how language differs among document types.☆2,303Updated 2 months ago
- 🔮 A refreshing functional take on deep learning, compatible with your favorite libraries☆2,858Updated 2 months ago
- Tools for diffing and merging of Jupyter notebooks.☆2,751Updated 9 months ago
- 🦆 Contextually-keyed word vectors☆1,654Updated 2 months ago