jonathandunn / corpus_similarityLinks
Measure the similarity of text corpora for 74 languages
☆13Updated last year
Alternatives and similar repositories for corpus_similarity
Users that are interested in corpus_similarity are comparing it to the libraries listed below
Sorting:
- The Mueller Report Corpus V 0.1☆11Updated 5 years ago
- TweetCaT - a tool for building Twitter corpora of smaller languages or specific geographical regions☆12Updated 8 years ago
- A python module for word inflections designed for use with spaCy.☆92Updated 5 years ago
- Repository for rstWeb, a browser based annotation interface for Rhetorical Structure Theory☆43Updated 8 months ago
- linguistic converter / merging tool for multi-level annotated corpora. graph-based (using Python and NetworkX).☆50Updated 2 years ago
- The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors (COLING 2016…☆68Updated 3 years ago
- Alignment and annotation for comparable documents.☆22Updated 6 years ago
- ☆70Updated 2 years ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆113Updated 5 months ago
- Repository for the Georgetown University Multilayer Corpus (GUM)☆98Updated 2 weeks ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 5 years ago
- Language Tool style grammar handling with spaCy 2.0☆42Updated 6 years ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆170Updated 3 years ago
- Implementation of a simple frame identification approach (SimpleFrameId) described in the paper "Out-of-domain FrameNet Semantic Role Lab…☆15Updated 8 years ago
- PANiC - PAraphrasing Noun-Compounds☆15Updated 7 years ago
- Generate a SQLite database from Wikipedia & Wikidata dumps.☆35Updated last year
- Text readability metrics in Python.☆11Updated 11 years ago
- Python 3 library for processing historical English☆67Updated 11 months ago
- Python tools for interacting with Wikidata☆154Updated last year
- Identifying Historical People, Places and other Entities: Shared Task on Named Entity Recognition and Linking on Historical Newspapers at…☆22Updated 11 months ago
- A compound word splitter for Python☆48Updated 3 years ago
- spaCy pipeline component for adding text readability meta data to Doc objects.☆56Updated 6 years ago
- A spaCy wrapper for DBpedia Spotlight☆110Updated 2 years ago
- Python library to work with ConceptNet offline☆10Updated 2 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆65Updated last year
- A Named-Entity Recogniser based on Grobid.☆55Updated 2 months ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆52Updated 4 years ago
- STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)☆66Updated last month
- A tool for text normalisation via character-level machine translation☆13Updated 5 years ago
- A Word Sense Disambiguation system integrating implicit and explicit external knowledge.☆69Updated 3 years ago