orgtre / google-books-ngram-frequency
Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code
☆62Updated last year
Alternatives and similar repositories for google-books-ngram-frequency:
Users that are interested in google-books-ngram-frequency are comparing it to the libraries listed below
- All the words from Google Books, sorted by frequency☆114Updated last year
- Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code☆29Updated last month
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆98Updated 3 weeks ago
- A Python package for learning, evaluating, annotating, and extracting vector representations of construction grammars☆36Updated 5 months ago
- A modern, interlingual wordnet interface for Python☆235Updated 3 weeks ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- Python Finite-State Toolkit☆53Updated 3 weeks ago
- Convert CoNLL output of a dependency parser into a latex or graphviz tree☆12Updated 4 years ago
- An English lexical database from the Big 🍎, let's go Mets baby love da Mets☆15Updated 2 months ago
- ☆72Updated 3 weeks ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆41Updated last year
- Improved Sentence Alignment in Linear Time and Space☆169Updated 2 years ago
- Sentence aligner☆112Updated 3 years ago
- Interactive visualization of Wiktionary words and etymologies.☆91Updated last month
- [LREC 2020] EtymDB, an Etymological DataBase (v2.1)☆24Updated 3 years ago
- This packages up data for the Open Multilingual Wordnet☆47Updated 2 weeks ago
- Small-vocabulary neural sequence-to-sequence generation with optional feature conditioning☆33Updated 2 weeks ago
- Efficient Low-Memory Aligner☆142Updated 2 months ago
- The Open English WordNet☆522Updated last month
- Offline bilingual dictionaries made using data from Wiktionary☆53Updated 9 years ago
- English web corpus with 4M tokens and several annotation types☆26Updated last year
- Hanzipy is a Chinese character and NLP module for Chinese language processing for python. It is primarily written to help provide a frame…☆19Updated last year
- Python Multilingual Ucrel Semantic Analysis System☆31Updated 7 months ago
- 🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python☆62Updated 2 weeks ago
- A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.☆35Updated this week
- Linguistically analyzed Classical Tibetan texts☆26Updated 3 years ago
- Gather modern English word frequencies from all enwiki articles.☆212Updated last year
- LingPy: Python library for quantitative tasks in historical linguistics☆129Updated 2 weeks ago
- Multilingual sentence alignment using sentence embeddings☆113Updated 4 months ago
- The World Atlas of Language Structures☆60Updated 5 months ago