akb89 / witokit
A Python toolkit to generate a tokenized dump of Wikipedia for NLP
☆11Updated 8 months ago
Alternatives and similar repositories for witokit:
Users that are interested in witokit are comparing it to the libraries listed below
- Scripts for building a geo-located web corpus using Common Crawl data☆11Updated 2 months ago
- Bayesian Assessment of Hypotheses☆24Updated last year
- Analyze Argumentation and Rhetorical Aspects in Scientific Writing.☆19Updated 2 years ago
- PANiC - PAraphrasing Noun-Compounds☆15Updated 6 years ago
- A simple neural truecaser written in pytorch and allennlp.☆32Updated 7 months ago
- Learned string similarity for entity names using optimal transport.☆34Updated 4 years ago
- Finds linguistic patterns effortlessly☆34Updated last year
- Featurize words into orthographic and phonological vectors.☆40Updated last year
- Converter from UD-trees to BART representation☆36Updated 10 months ago
- several algorithms for converting dependency structures into constituency structures.☆10Updated 2 years ago
- bin files☆13Updated last month
- Leaderboards are widely used in NLP and push the field forward. While leaderboards are a straightforward ranking of NLP models, this simp…☆17Updated 2 years ago
- ADS Project☆14Updated 9 years ago
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated 5 months ago
- GC4LM: A Colossal (Biased) language model for German☆13Updated 3 years ago
- List of corpora annotated for coreference for different languages☆17Updated 5 months ago
- Semeval-2021 Multilingual and Cross-lingual Word-in-Context Task☆18Updated 3 years ago
- Reference-less Quality Estimation of Text Simplification Systems☆48Updated last year
- A re-implementation of redpony/cdec's tokenize-anything.pl script in python☆8Updated 8 years ago
- Convert CoNLL output of a dependency parser into a latex or graphviz tree☆12Updated 4 years ago
- An example of how to use spaCy for extremely large files without running into memory issues☆36Updated 2 years ago
- Deep learning model of machine translation using attentional and structural biases☆13Updated 7 years ago
- Code and data for: Low Resource Grammatical Error Correction Using Wikipedia Edits (WNUT 2018)☆14Updated 6 months ago
- Multilingual Open Text☆25Updated 2 months ago
- Multilingual Language Modeling Toolkit☆11Updated 7 years ago
- 💫 A spaCy package for Yohei Tamura's Rust tokenizations library☆27Updated last year
- A web interface to understand language-specific BERT-models☆17Updated 9 months ago
- SeqScore: Scoring for named entity recognition and other sequence labeling tasks☆22Updated 2 weeks ago
- A python module to process data for Frame Semantic Parsing☆23Updated 4 years ago
- Crawling engine that crawls a set of top-level domains looking for documents in a list of languages☆10Updated 11 months ago