albertjuhe / charikars_algorithmLinks
Detecting near duplicates usign Moses Charikars Algorithm
☆20Updated 10 years ago
Alternatives and similar repositories for charikars_algorithm
Users that are interested in charikars_algorithm are comparing it to the libraries listed below
Sorting:
- Elasticsearch Latent Semantic Indexing experimentation☆33Updated 5 years ago
- An open relation extraction system☆46Updated 3 years ago
- An Apache Lucene TokenFilter that uses a word2vec vectors for term expansion.☆24Updated 11 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 3 years ago
- Web page segmentation and noise removal☆55Updated last year
- Python API for Various DB-Backed Simhash Clusters☆64Updated 8 years ago
- NER tagger for English, Spanish, Dutch, Italian and German and French.☆35Updated 9 years ago
- stop word lists in several languages☆21Updated 8 years ago
- An index data structure for approximate string search.☆23Updated 6 years ago
- Latent Dirichlet Allocation on tweets☆15Updated 10 years ago
- Tool for tweaking dbpedia spotlight's models☆16Updated 7 years ago
- Solr Dictionary Annotator (Microservice for Spark)☆71Updated 5 years ago
- NLP tools developed by Emory University.☆60Updated 8 years ago
- An OpenCalais API Interface for Python.☆20Updated 13 years ago
- Near-Duplicate Detection in Python.☆25Updated 3 years ago
- Scalable String Similarity Joins in Python☆39Updated 10 months ago
- framework for doing NER and other types of entity recognition, in Python☆68Updated 2 years ago
- Additional opennlp mapping type for elasticsearch in order to perform named entity recognition☆136Updated 9 years ago
- Using latent Dirichlet allocation (LDA) in Apache Lucene☆58Updated 12 years ago
- Using word2vec and t-SNE to compare text sources.☆20Updated 9 years ago
- Raw Wikipedia counts for entity linking☆19Updated 8 years ago
- Build tables of information by extracting facts from indexed text corpora via a simple and effective query language.☆56Updated 6 years ago
- [NO LONGER MAINTAINED AS OPEN SOURCE - USE SCALETEXT.COM INSTEAD]☆108Updated 12 years ago
- Using Word2Vec on lists and sets☆34Updated this week
- ☆46Updated 8 years ago
- Hidden alignment conditional random field for classifying string pairs.☆24Updated 3 weeks ago
- Entity Linking for the masses☆56Updated 9 years ago
- Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Sear…☆85Updated 4 years ago
- Implicit relation extractor using a natural language model.☆25Updated 7 years ago
- A tool for semantic relation extraction. The program finds pairs of semantically related words based on the text definitions coming from …☆26Updated 10 years ago