Access a database of word frequencies, in various natural languages.
☆1,646Jan 4, 2025Updated last year
Alternatives and similar repositories for wordfreq
Users that are interested in wordfreq are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆62Jul 1, 2021Updated 4 years ago
- Repository for Frequency Word List Generator and processed files☆1,468Feb 7, 2022Updated 4 years ago
- Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼☆22Jan 29, 2026Updated 2 months ago
- Fixes mojibake and other glitches in Unicode text, after the fact.☆4,025Oct 30, 2024Updated last year
- PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, an…☆477Sep 14, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This repository contains the Potsdam Textbook Corpus (PoTeC) which is a natural reading eye-tracking corpus.☆14Mar 18, 2026Updated 3 weeks ago
- Wiktionary dump file parser and multilingual data extractor☆1,134Mar 30, 2026Updated 2 weeks ago
- The Open English WordNet☆758Apr 7, 2026Updated last week
- 💫 Industrial-strength Natural Language Processing (NLP) in Python☆33,425Mar 28, 2026Updated 2 weeks ago
- Beautiful visualizations of how language differs among document types.☆2,328Apr 29, 2025Updated 11 months ago
- python package to calculate readability statistics of a text object - paragraphs, sentences, articles.☆1,363Feb 18, 2026Updated last month
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,205Apr 7, 2026Updated last week
- 🦆 Contextually-keyed word vectors☆1,673Mar 27, 2026Updated 2 weeks ago
- German lemmatization with IWNLP as extension for spaCy☆27Jul 28, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Entity linker for the newspaper collection of the National Library of the Netherlands. Links named entity mentions to DBpedia description…☆11Dec 8, 2022Updated 3 years ago
- Text readability metrics in Python.☆11Aug 29, 2013Updated 12 years ago
- ☆1,319Jul 18, 2022Updated 3 years ago
- Multilingual text (NLP) processing toolkit☆2,369Nov 10, 2023Updated 2 years ago
- Python implementation of TextRank algorithms ("textgraphs") for phrase extraction☆2,211Feb 15, 2026Updated 2 months ago
- A modern, interlingual wordnet interface for Python☆290Mar 21, 2026Updated 3 weeks ago
- NLP, before and after spaCy☆2,239Sep 22, 2023Updated 2 years ago
- SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm☆3,392Jan 20, 2026Updated 2 months ago
- This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of th…☆4,366May 17, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.☆912Aug 20, 2024Updated last year
- An LL parser for extracting information from Wiki text, particularly Wiktionary.☆50Aug 16, 2023Updated 2 years ago
- A minimal, pure Python library to interface with CoNLL-U format files.☆154Dec 5, 2025Updated 4 months ago
- 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows☆12,395Apr 8, 2026Updated last week
- ☆15Mar 2, 2026Updated last month
- Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.☆633Jun 24, 2021Updated 4 years ago
- A very simple framework for state-of-the-art Natural Language Processing (NLP)☆14,363Oct 27, 2025Updated 5 months ago
- 🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy☆338Apr 25, 2025Updated 11 months ago
- Hy-phen-ation made easy☆222Jan 5, 2026Updated 3 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Code for building ConceptNet from raw data.☆2,935Jan 19, 2023Updated 3 years ago
- 📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.☆3,527Apr 18, 2025Updated 11 months ago
- Toolkit to help understand "what lies" in word embeddings. Also benchmarking!☆473Feb 6, 2023Updated 3 years ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.☆32,201Sep 30, 2025Updated 6 months ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆171Dec 15, 2021Updated 4 years ago
- ✔️Contextual word checker for better suggestions (not actively maintained)☆419Jan 31, 2025Updated last year
- just a bunch of useful embeddings for scikit-learn pipelines☆525Feb 12, 2026Updated 2 months ago