pmandera / duometerLinks
Near-duplicate detection tool
☆24Updated 8 years ago
Alternatives and similar repositories for duometer
Users that are interested in duometer are comparing it to the libraries listed below
Sorting:
- bigram / trigram analysis of wikipedia; mainly mutual info☆22Updated 13 years ago
- KEA 5.0 (keyphrase extraction software), modified to be an XML-RPC service☆42Updated 13 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 3 years ago
- Easily identify and label sentence intervals using various taggers.☆16Updated 8 years ago
- Wikipedia API wrapper for humans and elk. (en.wikipedia.org/w/api.php, get it?)☆36Updated 10 years ago
- Pipeline for distributed Natural Language Processing, made in Python☆65Updated 8 years ago
- Algorithms for "schema matching"☆26Updated 8 years ago
- PDF Extraction Toolkit☆41Updated 4 years ago
- ☆21Updated 7 years ago
- framework for making streamcorpus data☆11Updated 8 years ago
- CROMER (CROss-document Main Events and entities Recognition), is a tool for cross-document coreference☆12Updated 10 years ago
- List of online / computer-based annotation tools☆18Updated 8 years ago
- common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text☆35Updated 8 years ago
- Stylometric framework in Python☆17Updated 10 years ago
- A lecture I gave at PyData NYC 2012 on using the networkx python library and Gephi to generate a mapping of the python community on Twitt…☆28Updated 12 years ago
- U.S. Code Complexity☆23Updated 11 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago
- Personal Knowledge Management System. Capture your ideas using plain old text files. Make a journal that lasts 100 years.☆29Updated last year
- A browser extension providing Open Access bibliographical services☆17Updated 2 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 3 years ago
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- A platform for collecting, analyzing, and visualizing social media data.☆12Updated 4 years ago
- Automated NLP sentiment predictions- batteries included, or use your own data☆18Updated 7 years ago
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 9 years ago
- Tool to cleanse and semantify datasets from CKAN repositories. Based on OpenRefine.☆23Updated 9 years ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Updated 8 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- Virtual patent marking crawler at iproduct.epfl.ch☆14Updated 7 years ago
- Keyword Extraction system using Brown Clustering - (This version is trained to extract keywords from job listings)☆18Updated 10 years ago