hackerb9 / gwordlistLinks
All the words from Google Books, sorted by frequency
☆116Updated last year
Alternatives and similar repositories for gwordlist
Users that are interested in gwordlist are comparing it to the libraries listed below
Sorting:
- Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code☆71Updated last year
- Offline bilingual dictionaries made using data from Wiktionary☆55Updated 10 years ago
- Tokenizes Chinese texts into words.☆98Updated 2 years ago
- Generation of bilingual dictionaries from Wiktionary/dbnary data for the WikDict project☆49Updated 7 months ago
- Verb forms dictionary☆66Updated 7 years ago
- A list of words from the SUBTLEX movie subtitles corpus, sorted by frequency.☆33Updated 5 years ago
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated 2 years ago
- WordNet in JSON format.☆91Updated 4 years ago
- Master repo for the UniMorph project, includes the UniMorph schema and annotated data files☆30Updated 5 years ago
- A Python Wiktionary Parser☆360Updated 3 months ago
- PHOIBLE Online☆42Updated 2 years ago
- X-SAMPA to IPA converter☆25Updated 4 years ago
- English Lemma Database - Compiled by Referencing British National Corpus☆31Updated 8 months ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆30Updated 3 years ago
- Lexical data at Unicode☆68Updated 9 months ago
- The Open English WordNet☆558Updated last week
- Pipeline to generate the Standardized Project Gutenberg Corpus☆184Updated last year
- Gather modern English word frequencies from all enwiki articles.☆213Updated last year
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆101Updated 2 weeks ago
- Text to IPA converter in JavaScript☆57Updated 2 years ago
- A set of utilities for processing MediaWiki XML dump data.☆53Updated 3 months ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- Interactive visualization of Wiktionary words and etymologies.☆92Updated 3 months ago
- Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code☆35Updated 3 months ago
- Crawler for linguistic corpora☆204Updated last year
- A list of vocabulary lists☆21Updated 4 years ago
- About 6,500 Irish lemmas ordered by corpus frequency, with noise removed.☆34Updated 7 years ago
- A versioned python wrapper package for cmudict (https://github.com/cmusphinx/cmudict).☆63Updated last month
- The Unicode Cookbook for Linguists☆54Updated 4 years ago