zyocum / dedupLinks
Find duplicate text files.
☆14Updated 4 months ago
Alternatives and similar repositories for dedup
Users that are interested in dedup are comparing it to the libraries listed below
Sorting:
- NSS Capstone project to use natural language modeling, classification, and information extraction to get the exact employee count values …☆15Updated 6 years ago
- This repository contains the DFKI Product Corpus, a dataset of 174 documents annotated for product and company named entities, and the re…☆12Updated 8 months ago
- Code and data for Teddy https://arxiv.org/abs/2001.05171.☆15Updated 2 years ago
- Crawling engine that crawls a set of top-level domains looking for documents in a list of languages☆11Updated last year
- classify a job description (or noisy job title) into a ONET job title☆19Updated 8 years ago
- sequence tagging with spaCy and crfsuite☆19Updated 2 years ago
- This is a prototype of a multi-lingual suite for named-entity recognition in Python.☆21Updated last year
- Tool for sentiment analysis annotation☆12Updated 2 months ago
- Loan Risk Prediction Neural Network and API☆17Updated 4 years ago
- A set of NLP tools created during my medium NLP Explanation series.☆31Updated last year
- ☆18Updated 7 years ago
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated last year
- A python library to generate highly realistic typos (fuzz-testing)☆11Updated 2 months ago
- Text pattern search using marisa-trie☆18Updated 4 months ago
- Scripts for building a geo-located web corpus using Common Crawl data☆11Updated last month
- Instructions for how to convert a BERT Tensorflow model to work with HuggingFace's pytorch-transformers, and spaCy. This walk-through use…☆27Updated 2 years ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated 2 years ago
- A tool for detecting sentence fragments.☆7Updated 8 years ago
- An index data structure for approximate string search.☆23Updated 6 years ago
- Code examples for Google Natural Language API.☆13Updated 5 years ago
- A set of methods for finding an appropriate number of topics in a text collection☆16Updated last month
- Interpretable feature construction from taxonomies for text classification☆18Updated 3 years ago
- Tools and services for evaluating topic models☆15Updated 9 years ago
- This is a document concerning Data Readiness in the context of machine learning and Natural Language Processing.☆11Updated 3 years ago
- Text preprocessing tools in python.☆27Updated 7 years ago
- BERT models for many languages created from Wikipedia texts☆33Updated 5 years ago
- Model for predicting categories of entities by its mentions☆29Updated 3 years ago
- Given a text, wrap it into phrases and send them to Yandex's search engine. If it yields a "did you mean:", substitute the original phras…☆11Updated 6 years ago
- python wrapper for facebook's duckling☆24Updated 9 months ago
- Aho-Corasick string replacement utility☆24Updated 5 years ago