rspeer/wordfreq

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/rspeer/wordfreq)

rspeer / wordfreq

Access a database of word frequencies, in various natural languages.

☆1,714

Alternatives and similar repositories for wordfreq

Users that are interested in wordfreq are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LuminosoInsight / exquisite-corpus
View on GitHub
Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.
☆62Jul 1, 2021Updated 5 years ago
hermitdave / FrequencyWords
View on GitHub
Repository for Frequency Word List Generator and processed files
☆1,523Feb 7, 2022Updated 4 years ago
proycon / pynlpl
View on GitHub
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, an…
☆476Sep 14, 2023Updated 2 years ago
rspeer / python-ftfy
View on GitHub
Fixes mojibake and other glitches in Unicode text, after the fact.
☆4,051Oct 30, 2024Updated last year
tatuylonen / wiktextract
View on GitHub
Wiktionary dump file parser and multilingual data extractor
☆1,230Jul 22, 2026Updated last week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
explosion / spaCy
View on GitHub
💫 Industrial-strength Natural Language Processing (NLP) in Python
☆33,780May 19, 2026Updated 2 months ago
JasonKessler / scattertext
View on GitHub
Beautiful visualizations of how language differs among document types.
☆2,337Jul 4, 2026Updated 3 weeks ago
jamesturk / jellyfish
View on GitHub
🪼 a python library for doing approximate and phonetic matching of strings.
☆2,227Updated this week
textstat / textstat
View on GitHub
python package to calculate readability statistics of a text object - paragraphs, sentences, articles.
☆1,376Feb 18, 2026Updated 5 months ago
globalwordnet / english-wordnet
View on GitHub
The Open English WordNet
☆838Updated this week
Liebeck / spacy-iwnlp
View on GitHub
German lemmatization with IWNLP as extension for spaCy
☆27Apr 13, 2026Updated 3 months ago
explosion / sense2vec
View on GitHub
🦆 Contextually-keyed word vectors
☆1,678Mar 27, 2026Updated 4 months ago
KBNLresearch / dac
View on GitHub
Entity linker for the newspaper collection of the National Library of the Netherlands. Links named entity mentions to DBpedia description…
☆11Dec 8, 2022Updated 3 years ago
commonsense / conceptnet-numberbatch
View on GitHub
☆1,322Jul 18, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
burrsettles / readability
View on GitHub
Text readability metrics in Python.
☆11Aug 29, 2013Updated 12 years ago
goodmami / wn
View on GitHub
A modern, interlingual wordnet interface for Python
☆296Mar 21, 2026Updated 4 months ago
DerwenAI / pytextrank
View on GitHub
Python implementation of TextRank algorithms ("textgraphs") for phrase extraction
☆2,219Jun 24, 2026Updated last month
chartbeat-labs / textacy
View on GitHub
NLP, before and after spaCy
☆2,239Sep 22, 2023Updated 2 years ago
aboSamoor / polyglot
View on GitHub
Multilingual text (NLP) processing toolkit
☆2,364Nov 10, 2023Updated 2 years ago
wolfgarbe / SymSpell
View on GitHub
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
☆3,454Jul 4, 2026Updated 3 weeks ago
first20hours / google-10000-english
View on GitHub
This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of th…
☆4,435May 17, 2023Updated 3 years ago
rspeer / wikiparsec
View on GitHub
An LL parser for extracting information from Wiki text, particularly Wiktionary.
☆51Aug 16, 2023Updated 2 years ago
nipunsadvilkar / pySBD
View on GitHub
🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
☆926Aug 20, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
pyconll / pyconll
View on GitHub
A minimal, pure Python library to interface with CoNLL-U format files.
☆155Jul 6, 2026Updated 3 weeks ago
gutfeeling / word_forms
View on GitHub
Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.
☆630Jun 24, 2021Updated 5 years ago
explosion / floret
View on GitHub
🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy
☆343Apr 25, 2025Updated last year
flairNLP / flair
View on GitHub
A very simple framework for state-of-the-art Natural Language Processing (NLP)
☆14,381Oct 27, 2025Updated 9 months ago
neuml / txtai
View on GitHub
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
☆12,761Updated this week
Kozea / Pyphen
View on GitHub
Hy-phen-ation made easy
☆230Jun 19, 2026Updated last month
commonsense / conceptnet5
View on GitHub
Code for building ConceptNet from raw data.
☆2,952Jan 19, 2023Updated 3 years ago
life4 / textdistance
View on GitHub
📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
☆3,535Apr 18, 2025Updated last year
koaning / whatlies
View on GitHub
Toolkit to help understand "what lies" in word embeddings. Also benchmarking!
☆481Feb 6, 2023Updated 3 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
R1j1t / contextualSpellCheck
View on GitHub
✔️Contextual word checker for better suggestions (not actively maintained)
☆420Jan 31, 2025Updated last year
chemicaltree / tetra
View on GitHub
☆10Sep 14, 2022Updated 3 years ago
dmort27 / epitran
View on GitHub
A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)
☆828Jun 18, 2026Updated last month
facebookresearch / LASER
View on GitHub
Language-Agnostic SEntence Representations
☆3,662May 2, 2024Updated 2 years ago
facebookresearch / fairseq
View on GitHub
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
☆32,250Sep 30, 2025Updated 9 months ago
koaning / embetter
View on GitHub
just a bunch of useful embeddings for scikit-learn pipelines
☆527Feb 12, 2026Updated 5 months ago
Mimino666 / langdetect
View on GitHub
Port of Google's language-detection library to Python.
☆1,897Mar 3, 2025Updated last year