dell-research-harvard / linktransformer
A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning
☆113Updated 3 weeks ago
Alternatives and similar repositories for linktransformer:
Users that are interested in linktransformer are comparing it to the libraries listed below
- Entity Matching Model solves the problem of matching company names between two possibly very large datasets.☆67Updated 2 weeks ago
- Innovation across ages☆68Updated last year
- Tool for probabilistically linking the records of individual entities (e.g. people) within and across datasets☆109Updated 2 months ago
- An End-to-End Evaluation Framework for Entity Resolution Systems☆26Updated last year
- Code for measuring novelty in science using publication text☆24Updated 2 weeks ago
- ☆30Updated last month
- A python package to enrich Twitter Data☆74Updated last year
- Fast, flexible name matching for large datasets☆70Updated last year
- ☆31Updated this week
- PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT☆77Updated 3 months ago
- Tools for interactive visual exploration of semantic embeddings.☆30Updated 5 months ago
- Python package for text mining of time-series data☆69Updated 2 months ago
- KeypartX is a graph-based approach to represent perception (text in general) by key parts of speech.Updated last year
- Name matching is a Python package for the matching of company names. This package has been developed to match the names of companies from…☆143Updated this week
- ☆54Updated last year
- A Flexible Deep Learning Approach to Fuzzy String Matching☆141Updated 4 months ago
- Nesta's Skills Extractor Library☆126Updated 3 months ago
- Noise-robust de-duplication at scale☆16Updated last year
- code base for constructing narrative statements from text☆104Updated last year
- A tutorial on entity resolution (record linkage or de-duplication)☆63Updated 4 years ago
- A shared repository for data cleaning scripts used for innovation data.☆29Updated 3 years ago
- Natural language processing tools developed by the World Bank's DECAT unit. A suite of text preprocessing and cleaning algorithms for NLP…☆10Updated 2 years ago
- Course repository for the session "Hands-on Transformers: Fine-Tune your own BERT and GPT" of the Data Science Summer School 2023☆83Updated last year
- This repository contains the raw data, code, and sources used to create an individual level and state municipal incorporation date datase…☆23Updated 9 months ago
- Powerful topic model visualization in Python☆110Updated 3 weeks ago
- ☆84Updated 9 months ago
- ☆78Updated 8 months ago
- ConfliBERT: A Pre-trained Language Model for Political Conflict and Violence (NAACL 2022)☆30Updated 2 weeks ago
- Python package to interact with Factiva news-related APIs. Services are described in the Dow Jones Developer Platform.☆18Updated 2 years ago
- Given a job title and job description, the algorithm assigns a standard occupational classification (SOC) code to the job.☆73Updated 7 months ago