pmandera / duometerLinks
Near-duplicate detection tool
☆24Updated 8 years ago
Alternatives and similar repositories for duometer
Users that are interested in duometer are comparing it to the libraries listed below
Sorting:
- Simple natural language parsing and semantic grounding☆10Updated 4 years ago
- Pipeline for distributed Natural Language Processing, made in Python☆65Updated 8 years ago
- framework for making streamcorpus data☆11Updated 8 years ago
- Deployment of pywb as a CommonCrawl Index Server☆21Updated 7 years ago
- U.S. Code Complexity☆23Updated 11 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆24Updated 8 years ago
- ☆11Updated 6 years ago
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 9 years ago
- PDF Extraction Toolkit☆41Updated 4 years ago
- A tool for calculation semantic similarity between words from a text corpus based on lexico-syntactic patterns.☆27Updated 9 years ago
- Raw Wikipedia counts for entity linking☆19Updated 8 years ago
- Calculate political polarization scores for members of U.S. Congress based on their tweets☆11Updated 7 years ago
- Python natural language processing work☆29Updated 15 years ago
- stav text annotation visualiser☆34Updated 13 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆17Updated 9 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Navigating around a grid of cells like XPath for spreadsheets; supports Python 3.5+☆48Updated 2 years ago
- The open source tools for building, maintaining and deploying Topic Maps-based applications.☆57Updated this week
- 📑 Python Package to reconstruct the original continuous text from PDFs with language models☆32Updated last year
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 11 years ago
- Personal Knowledge Management System. Capture your ideas using plain old text files. Make a journal that lasts 100 years.☆29Updated last year
- CROMER (CROss-document Main Events and entities Recognition), is a tool for cross-document coreference☆12Updated 10 years ago
- Easily identify and label sentence intervals using various taggers.☆16Updated 8 years ago
- NYT Risk Semantics Project☆12Updated 9 years ago
- A financial disclosure data extraction tool.☆16Updated last year
- ProbLog 2 is now at https://github.com/ML-KULeuven/problog☆10Updated 6 years ago
- Framework for creating and accessing UBY resources – sense-linked lexical resources in standard UBY-LMF format☆22Updated 7 years ago
- Semanticizest: dump parser and client☆20Updated 9 years ago
- A Python library for learning from dimensionality reduction, supporting sparse and dense matrices.☆78Updated 8 years ago
- Some convenient natural language tools that build on NLTK.☆85Updated 11 years ago