pmandera / duometer
Near-duplicate detection tool
☆23Updated 7 years ago
Related projects ⓘ
Alternatives and complementary repositories for duometer
- List of online / computer-based annotation tools☆18Updated 7 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- stav text annotation visualiser☆34Updated 13 years ago
- U.S. Code Complexity☆23Updated 11 years ago
- bigram / trigram analysis of wikipedia; mainly mutual info☆22Updated 12 years ago
- Easily identify and label sentence intervals using various taggers.☆16Updated 7 years ago
- Scraper built with Scrapy.☆14Updated 3 months ago
- A financial disclosure data extraction tool.☆13Updated last year
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 2 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 3 years ago
- Calculate political polarization scores for members of U.S. Congress based on their tweets☆11Updated 7 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 7 years ago
- The OpenSextant Gazetteer is a collection of world-wide place name data☆12Updated 6 years ago
- A glossary for the United States.☆42Updated 9 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆14Updated 9 years ago
- Json Wikipedia, contains code to convert the Wikipedia xml dump into a json dump. Questions? https://gitter.im/idio-opensource/Lobby☆17Updated 2 years ago
- Crawling and analyzing data on Wikipedia☆16Updated 8 months ago
- Keyword Extraction system using Brown Clustering - (This version is trained to extract keywords from job listings)☆18Updated 10 years ago
- Framework for creating and accessing UBY resources – sense-linked lexical resources in standard UBY-LMF format☆22Updated 6 years ago
- Stanford Entity-Resolution Framework☆23Updated 6 years ago
- An index data structure for approximate string search.☆23Updated 5 years ago
- A tool for calculation semantic similarity between words from a text corpus based on lexico-syntactic patterns.☆28Updated 8 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 10 years ago
- Integration between Reaction ECommerce and Accelerated Text to provide product descriptions for an e-shop.☆9Updated 3 years ago
- The news homepage archive☆81Updated 3 years ago
- Disambiguating biomedical and clinical concepts with word embeddings☆14Updated 6 years ago
- A collection of regular expressions for matching citations to state, federal, and even international law☆33Updated 3 years ago
- Recipes for training OpenNMT systems☆14Updated 7 years ago
- Automated NLP sentiment predictions- batteries included, or use your own data☆18Updated 6 years ago