sts10 / common_word_list_maker
Scrapes Google Books Ngram data to create a long word list
β13Updated 10 months ago
Alternatives and similar repositories for common_word_list_maker:
Users that are interested in common_word_list_maker are comparing it to the libraries listed below
- Combine and clean word listsβ83Updated 3 months ago
- A repository for word lists I've generatedβ25Updated 2 months ago
- Quickly look up hashes in your terminal using the HashMob API π₯β10Updated last year
- Wordlists designed for generating passphrasesβ28Updated 6 months ago
- Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python codeβ25Updated 2 years ago
- A very poor and very simple local face recognition search engineβ16Updated 9 months ago
- Strip attachments from local mbox filesβ15Updated last year
- an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correctionβ32Updated 3 months ago
- Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python codeβ56Updated last year
- The Unicode Cookbook for Linguistsβ53Updated 4 years ago
- Markdown text to a novel in ePub and PDF.β45Updated 3 years ago
- This repository provides various Python methods for finding and aggregating synonyms for an individual word or a list of words.β33Updated last year
- Tools for compiling corpora from Common Crawlβ13Updated last month
- Various pages and tools for working with non-Latin scriptsβ36Updated 2 weeks ago
- NGRAMS is a search engine for the Google Books Ngram Dataset. This repository contains documentation, discussions, announcements, and issβ¦β16Updated last year
- π β’ 5050 most frequent words in 109 languagesβ39Updated 2 years ago
- Lists of most-frequently-used english words / nouns / verbs etc.β53Updated 4 years ago
- convert SQL dumps and other leaked db dump formats to CSVβ49Updated 8 months ago
- Auto updating archive of my Twitter lists.β15Updated last month
- Simple, fast dictionary-based language detector for short texts.β13Updated this week
- Automatically punctuate lecture transcripts obtained from YouTube.β18Updated 4 years ago
- Offline bilingual dictionaries made using data from Wiktionaryβ52Updated 9 years ago
- Offline etymological dictionary based on Wiktionary dataβ21Updated 2 years ago
- FieldWorks is a suite of software tools for language and cultural data, with support for complex scripts.β86Updated this week
- Lexical data at Unicodeβ67Updated 4 months ago
- Check the "health" of passwords in a KeePass databaseβ25Updated last month
- Scripts for Internet Archiveβ12Updated 4 years ago
- Password Transformation Tool (ptt) is a versatile utility designed for password cracking.β27Updated last month
- The source of the phonetic transcriptions is Oxford Advanced Learner's Dictionary (3rd ed.), available from the Oxford Text Archive (httpβ¦β23Updated 7 years ago