orgtre / google-books-ngram-frequency
Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code
☆50Updated last year
Related projects ⓘ
Alternatives and complementary repositories for google-books-ngram-frequency
- Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code☆23Updated 2 years ago
- All the words from Google Books, sorted by frequency☆109Updated last year
- Lists of most-frequently-used english words / nouns / verbs etc.☆49Updated 4 years ago
- A Python package for learning, evaluating, annotating, and extracting vector representations of construction grammars☆34Updated last month
- A modern, interlingual wordnet interface for Python☆221Updated last week
- Open Language Profiles — English profile datasets from CEFR-J☆103Updated 4 years ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆94Updated this week
- Python Multilingual Ucrel Semantic Analysis System☆30Updated 3 months ago
- A versioned python wrapper package for cmudict (https://github.com/cmusphinx/cmudict).☆62Updated 2 months ago
- Offline bilingual dictionaries made using data from Wiktionary☆52Updated 9 years ago
- MFTE (Multi Feature Tagger of English) Python is the Python version based on Le Foll's MFTE written in Perl. It is extended to include se…☆21Updated 3 months ago
- Han character library for CJKV languages☆150Updated 3 years ago
- The spoken L1 corpus represents present-day spoken Chinese (Putonghua) used in mainland China, which is designed as a comparable corpus t…☆17Updated 3 years ago
- The Unicode Cookbook for Linguists☆53Updated 4 years ago
- Various utilities for processing the data.☆207Updated this week
- The source of the phonetic transcriptions is Oxford Advanced Learner's Dictionary (3rd ed.), available from the Oxford Text Archive (http…☆22Updated 7 years ago
- Improved Sentence Alignment in Linear Time and Space☆163Updated last year
- Etymological graphs based on Wiktionary dumps☆18Updated last year
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆52Updated 3 years ago
- Sentence aligner☆108Updated 3 years ago
- NGRAMS is a search engine for the Google Books Ngram Dataset. This repository contains documentation, discussions, announcements, and iss…☆14Updated last year
- University of Colorado VerbNet☆101Updated 6 months ago
- Tokenizer POS-Tagger and Dependency-parser with BERT/RoBERTa/DeBERTa models for Japanese and other languages☆48Updated last month
- Efficient Low-Memory Aligner☆139Updated 2 months ago
- Gather modern English word frequencies from all enwiki articles.☆204Updated 8 months ago
- This packages up data for the Open Multilingual Wordnet☆43Updated 3 weeks ago
- A tool to find grammar patterns in Chinese text☆24Updated 4 years ago
- British English pronunciation dictionary☆89Updated 7 years ago
- A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.☆35Updated last week
- ☆67Updated 3 months ago