sts10 / common_word_list_maker
Scrapes Google Books Ngram data to create a long word list
☆13Updated last year
Alternatives and similar repositories for common_word_list_maker:
Users that are interested in common_word_list_maker are comparing it to the libraries listed below
- Combine and clean word lists☆87Updated last month
- Wordlists designed for generating passphrases☆31Updated this week
- A repository for word lists I've generated☆30Updated 2 months ago
- The Carnegie Mellon Pronouncing Dictionary (CMUdict).☆15Updated last month
- A sentence segmentation library with wide language support optimized for speed and utility.☆61Updated 7 months ago
- Pandoc Lua filter for linguistic examples☆39Updated last month
- Offline etymological dictionary based on Wiktionary data☆21Updated 3 years ago
- The Unicode Cookbook for Linguists☆53Updated 4 years ago
- Download an entire book (or publication) in PDF file from Hathi Trust Digital Library without "partner login" requirement☆53Updated 6 months ago
- A tool to manipulate ePub files.☆26Updated 4 years ago
- 📦 A collection of files for LibriVox recordings to produce ebooks with synchronized text and audio☆25Updated 4 years ago
- an experimental implementation of Burrow's delta in Python 3☆21Updated 3 years ago
- Quickly look up hashes in your terminal using the HashMob API 🔥☆12Updated 2 years ago
- Lists of most-frequently-used english words / nouns / verbs etc.☆63Updated 4 years ago
- Simplified version of a common crawl fetcher☆13Updated 2 weeks ago
- hashgen - the blazingly fast hash generator☆34Updated 2 weeks ago
- Fast Neural Machine Translation in C++ - development repository☆19Updated 11 months ago
- Unofficial Anna's Archive API written in JS.☆40Updated last year
- A knowledge base theme for Hugo☆11Updated 3 years ago
- Batch download books from libgen☆16Updated 9 years ago
- Grabs data from IVRE and brings it into Obsidian notes☆33Updated this week
- A versioned python wrapper package for cmudict (https://github.com/cmusphinx/cmudict).☆62Updated 2 weeks ago
- Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code☆33Updated 2 months ago
- Faster, modernized fork of the language identification tool langid.py☆55Updated 5 months ago
- Fast syllable estimation library based on pattern matching.☆37Updated last month
- Security and Privacy Failures in Popular 2FA Apps☆19Updated last year
- subdomain list based on Common Crawl data, sorted by popularity☆17Updated 5 years ago
- Script and sample dataset of all urban dictionary entry names (around 1.4 million total)☆90Updated 2 years ago
- ☆24Updated 4 years ago
- Character-level conversion between Hebrew text and Latin transliteration using deep learning - a demonstration of seq2seq training.☆13Updated last year