pmandera / duometerLinks
Near-duplicate detection tool
☆24Updated 9 years ago
Alternatives and similar repositories for duometer
Users that are interested in duometer are comparing it to the libraries listed below
Sorting:
- General Architecture for Text Engineering☆49Updated 9 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆17Updated 10 years ago
- Simple taxonomy management tool and document classifier.☆56Updated 5 years ago
- Topic modeling web application☆40Updated 10 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆55Updated 4 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 11 years ago
- A platform for collecting, analyzing, and visualizing social media data.☆12Updated 4 years ago
- Focused Crawler for VT's CTRNet☆10Updated 12 years ago
- Just like on ScraperWiki Classic; now a part of QuickCode.☆38Updated 9 years ago
- Parser for KAF NAF files written in Python☆16Updated 4 years ago
- Raw Wikipedia counts for entity linking☆19Updated 8 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆25Updated 9 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 8 years ago
- Tools for tracking stories on news homepages☆48Updated 6 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- bigram / trigram analysis of wikipedia; mainly mutual info☆22Updated 13 years ago
- A pipeline for crawling of RSS feeds and the associated content. Demo at newsfeed.ijs.si.☆21Updated 13 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆63Updated 4 months ago
- Easily identify and label sentence intervals using various taggers.☆16Updated 8 years ago
- Wrapper to pocketsphinx phoneme labeling tools☆18Updated 9 years ago
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- Vizlinc☆15Updated 9 years ago
- Exploits Wikipedia's daily view counts to find out what topics are current trends☆18Updated 12 years ago
- Extracts character names from a text file and performs analysis of text sentences containing the names.☆54Updated 2 years ago
- Navigating around a grid of cells like XPath for spreadsheets; supports Python 3.5+☆48Updated 2 years ago
- OpenBlock is a web application and RESTful service that allows users to browse and search their local area for "hyper-local news☆61Updated 4 years ago
- ☆14Updated 4 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆95Updated 7 years ago
- Homebase of the IPTC EXTRA project about rule-based text categorization☆13Updated 8 years ago
- DKPro WSD: A Java framework for word sense disambiguation☆20Updated 3 years ago