orgtre / google-books-ngram-frequency
Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code
☆49Updated last year
Related projects ⓘ
Alternatives and complementary repositories for google-books-ngram-frequency
- All the words from Google Books, sorted by frequency☆109Updated last year
- A Python package for learning, evaluating, annotating, and extracting vector representations of construction grammars☆34Updated 3 weeks ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- MFTE (Multi Feature Tagger of English) Python is the Python version based on Le Foll's MFTE written in Perl. It is extended to include se…☆20Updated 3 months ago
- This will download and process the Google Ngram data.☆13Updated last year
- Python Multilingual Ucrel Semantic Analysis System☆30Updated 2 months ago
- Creates interlinearized versions of books (EPUB, MOBI, etc), adding "subtitles" with translations under each word in the text.☆22Updated 4 years ago
- The World Atlas of Language Structures☆55Updated 3 weeks ago
- [LREC 2020] EtymDB, an Etymological DataBase (v2.1)☆21Updated 2 years ago
- ☆28Updated 2 weeks ago
- Python version for Doug Biber's Multidimensional Analysis (MDA)☆27Updated 4 months ago
- linguistics tree drawing to SVG in python, aimed at Jupyter☆62Updated 2 months ago
- The Unicode Cookbook for Linguists☆53Updated 3 years ago
- The spoken L1 corpus represents present-day spoken Chinese (Putonghua) used in mainland China, which is designed as a comparable corpus t…☆17Updated 3 years ago
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆43Updated last year
- Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code☆22Updated 2 years ago
- This packages up data for the Open Multilingual Wordnet☆43Updated last week
- uncover old chinese textual parallels based on sound☆12Updated this week
- Searching in-memory corpus with Corpus Query Language (CQL)☆17Updated 3 years ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆27Updated 3 years ago
- ☆11Updated 8 months ago
- University of Colorado VerbNet☆101Updated 5 months ago
- ☆56Updated this week
- Extracts plain text, language identification and more metadata from WARC records☆20Updated 3 months ago
- linguistics backend☆40Updated last year
- Various utilities for processing the data.☆205Updated this week
- Gather modern English word frequencies from all enwiki articles.☆202Updated 8 months ago
- A list of vocabulary lists☆21Updated 4 years ago
- Wikipedia Bilingual Reference Data (English)☆15Updated 8 years ago
- Multilingual syllable annotation pipeline component for spacy☆37Updated last year
- Python Finite-State Toolkit☆44Updated 3 months ago