pmandera / duometerLinks
Near-duplicate detection tool
☆24Updated 8 years ago
Alternatives and similar repositories for duometer
Users that are interested in duometer are comparing it to the libraries listed below
Sorting:
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆17Updated 9 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆61Updated last week
- Easily identify and label sentence intervals using various taggers.☆16Updated 8 years ago
- Topic modeling web application☆41Updated 10 years ago
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 9 years ago
- A pipeline for crawling of RSS feeds and the associated content. Demo at newsfeed.ijs.si.☆21Updated 12 years ago
- Simple taxonomy management tool and document classifier.☆56Updated 5 years ago
- Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your com…☆130Updated 5 months ago
- Json Wikipedia, contains code to convert the Wikipedia xml dump into a json dump. Questions? https://gitter.im/idio-opensource/Lobby☆17Updated 3 years ago
- General Architecture for Text Engineering☆50Updated 9 years ago
- ScraperWiki Python library for scraping and saving data☆159Updated 2 years ago
- stav text annotation visualiser☆34Updated 13 years ago
- Semanticizest: dump parser and client☆20Updated 9 years ago
- Vizlinc☆15Updated 9 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 3 years ago
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- A platform for collecting, analyzing, and visualizing social media data.☆12Updated 4 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 3 years ago
- An OpenCalais API Interface for Python.☆20Updated 13 years ago
- Tools for tracking stories on news homepages☆48Updated 5 years ago
- Suite of tools for detecting changes in web pages and their rendering☆54Updated last year
- Just like on ScraperWiki Classic; now a part of QuickCode.☆38Updated 9 years ago
- Contains the implementation of algorithms that estimate the geographic location of media content based on their content and metadata. It …☆15Updated 8 years ago
- A project for clustering text streams using locality-sensitive hashing (LSH) in Python☆26Updated 13 years ago
- Solrstrap is a Query-Result interface for Solr written in JavaScript, HTML and CSS☆87Updated 8 years ago
- Presentations on Quantified Self and Self-Tracking with Python☆30Updated 2 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 11 years ago
- extensible Web Retrieval Toolkit☆17Updated 3 years ago
- The news homepage archive☆80Updated 3 years ago