sts10 / common_word_list_makerLinks
Scrapes Google Books Ngram data to create a long word list
β13Updated last year
Alternatives and similar repositories for common_word_list_maker
Users that are interested in common_word_list_maker are comparing it to the libraries listed below
Sorting:
- Combine and clean word listsβ95Updated this week
- Quickly look up hashes in your terminal using the HashMob API π₯β13Updated 2 years ago
- hashgen - the blazingly fast hash generatorβ40Updated 2 weeks ago
- A sentence segmentation library with wide language support optimized for speed and utility.β86Updated 3 weeks ago
- Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python codeβ103Updated 2 years ago
- Script and sample dataset of all urban dictionary entry names (around 1.4 million total)β96Updated 3 years ago
- A repository for word lists I've generatedβ35Updated last month
- Wordlists designed for generating passphrasesβ41Updated 2 weeks ago
- xlsxgrep is a CLI tool to search text in XLSX, XLS, XLSM, CSV, TSV and ODS files. It works similarly to Unix/GNU Linux grep.β51Updated last month
- anewer appends lines from stdin to a file if they don't already exist in the file. This is a rust version of https://github.com/tomnomnomβ¦β60Updated last year
- Archive a reddit user's post history. Formatted overview of a profile, JSON containing every post, and picture downloads. Uses the pushsβ¦β52Updated 3 years ago
- A polite and user-friendly downloader for Common Crawl dataβ67Updated 5 months ago
- lesspipe for ripgrep for common new filetypes using few dependenciesβ33Updated 4 years ago
- A tool for enumerating usernames from text, files, or websitesβ82Updated 3 years ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.β58Updated 4 years ago
- Analyze and help extract older "hidden" versions of a pdf from the current pdf.β100Updated 3 years ago
- Library for extracting text and timestamps from multiple subtitle files (.ass, .ssa, .srt, .sub, .txt).β53Updated last year
- subdomain list based on Common Crawl data, sorted by popularityβ17Updated 6 years ago
- DomainsProject.org HTTP workerβ25Updated 3 years ago
- DomainsProject.org DNS workerβ26Updated last year
- π A simple Google query builder for document file discoveryβ26Updated 10 months ago
- The script uses an Google maps API to download photos of places in the area specified by coordinates and search radiusβ18Updated 2 years ago
- A keylogger written in Rust to run on Windows (only educational)β21Updated 6 years ago
- β‘ Blazing-fast tool to grab screenshots of your domain list right from terminal.β432Updated 9 months ago
- Spider - web crawler and local wordlist processor to generate frequency sorted wordlist / ngramsβ28Updated last month
- Demeuk is a simple tool to clean up corpora (like dictionaries) or any dataset containing plain text strings.β21Updated 6 months ago
- Security and Privacy Failures in Popular 2FA Appsβ19Updated 2 years ago
- an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction (mirror of https://β¦β37Updated last month
- Extract metadata from a video to an sqlite databaseβ20Updated last year
- Dumps all of the Key/Value pairs from a LevelDB databaseβ103Updated last month