pmandera / duometer
Near-duplicate detection tool
☆23Updated 8 years ago
Alternatives and similar repositories for duometer:
Users that are interested in duometer are comparing it to the libraries listed below
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- Pipeline for distributed Natural Language Processing, made in Python☆65Updated 8 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- Presentations on Quantified Self and Self-Tracking with Python☆30Updated 2 years ago
- The OpenSextant Gazetteer is a collection of world-wide place name data☆12Updated 7 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 3 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 3 years ago
- Raw Wikipedia counts for entity linking☆19Updated 7 years ago
- Easily identify and label sentence intervals using various taggers.☆16Updated 8 years ago
- U.S. Code Complexity☆23Updated 11 years ago
- Vizlinc☆14Updated 9 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 10 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 7 years ago
- Personal Knowledge Management System. Capture your ideas using plain old text files. Make a journal that lasts 100 years.☆28Updated last year
- Some convenient natural language tools that build on NLTK.☆85Updated 10 years ago
- An RDF Search Engine☆57Updated 7 years ago
- Virtual patent marking crawler at iproduct.epfl.ch☆14Updated 7 years ago
- A framework to allow the matching of string entities using customised sets of transformations and matchers, plus a tool to produce the ne…☆31Updated 7 years ago
- Includes Code for Inference and Evaluation of Topic Models for Selectional Preferences☆16Updated 2 years ago
- An index data structure for approximate string search.☆23Updated 5 years ago
- List of online / computer-based annotation tools☆18Updated 8 years ago
- framework for making streamcorpus data☆11Updated 8 years ago
- General Architecture for Text Engineering☆48Updated 9 years ago
- ☆14Updated 3 years ago
- (Archived) A Python library for record linkage and deduplication.☆19Updated last year
- A system for connecting language to space and time.☆64Updated 4 years ago
- A workflow system for Natural Language Processing.☆21Updated 5 years ago
- Tools for tracking stories on news homepages☆48Updated 5 years ago
- Framework for creating and accessing UBY resources – sense-linked lexical resources in standard UBY-LMF format☆22Updated 6 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆24Updated 8 years ago