IlyaSemenov / wikipedia-word-frequencyLinks
Gather modern English word frequencies from all enwiki articles.
☆213Updated last year
Alternatives and similar repositories for wikipedia-word-frequency
Users that are interested in wikipedia-word-frequency are comparing it to the libraries listed below
Sorting:
- A Python Wiktionary Parser☆360Updated 3 months ago
- Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code☆35Updated 3 months ago
- A modern, interlingual wordnet interface for Python☆247Updated this week
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆101Updated 2 weeks ago
- Master repo for the UniMorph project, includes the UniMorph schema and annotated data files☆30Updated 5 years ago
- Sentence aligner☆113Updated 4 years ago
- Machine-readable lists of lemma-token pairs in 23 languages.☆340Updated 3 years ago
- Offline bilingual dictionaries made using data from Wiktionary☆55Updated 10 years ago
- A list of vocabulary lists☆21Updated 4 years ago
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- List of English synonyms and antonyms parsed from the public domain book of James C. Fernald, 1896☆43Updated 6 years ago
- Morphological Dictionaries for German Language☆29Updated 7 years ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆30Updated 3 years ago
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated 2 years ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆246Updated 2 years ago
- Open Language Profiles — English profile datasets from CEFR-J☆126Updated 5 years ago
- Lexical database for ~70k English words with morphological variables☆44Updated 3 years ago
- Verb forms dictionary☆66Updated 7 years ago
- A cloud-based, open-source system for writing and publishing dictionaries.☆91Updated last year
- A minimal, pure Python library to interface with CoNLL-U format files.☆151Updated last year
- Repository for the Georgetown University Multilayer Corpus (GUM)☆97Updated 2 weeks ago
- Tokenizer POS-Tagger and Dependency-parser with BERT/RoBERTa/DeBERTa/GPT models for Japanese and other languages☆50Updated last month
- WordNet in JSON format.☆91Updated 4 years ago
- ☆64Updated last year
- [LREC 2020] EtymDB, an Etymological DataBase (v2.1)☆24Updated 3 years ago
- A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning tech…☆73Updated 8 months ago
- This packages up data for the Open Multilingual Wordnet☆49Updated this week
- English Lemma Database - Compiled by Referencing British National Corpus☆31Updated 8 months ago
- German Morphological Analyzer☆47Updated 3 years ago
- Morfessor is a tool for unsupervised and semi-supervised morphological segmentation☆193Updated 4 years ago