Access a database of word frequencies, in various natural languages.
☆1,641Jan 4, 2025Updated last year
Alternatives and similar repositories for wordfreq
Users that are interested in wordfreq are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆61Jul 1, 2021Updated 4 years ago
- Repository for Frequency Word List Generator and processed files☆1,463Feb 7, 2022Updated 4 years ago
- Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼☆22Jan 29, 2026Updated last month
- Fixes mojibake and other glitches in Unicode text, after the fact.☆4,017Oct 30, 2024Updated last year
- PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, an…☆477Sep 14, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This repository contains the Potsdam Textbook Corpus (PoTeC) which is a natural reading eye-tracking corpus.☆14Mar 18, 2026Updated last week
- Wiktionary dump file parser and multilingual data extractor☆1,125Updated this week
- The Open English WordNet☆752Mar 20, 2026Updated last week
- 💫 Industrial-strength Natural Language Processing (NLP) in Python☆33,352Mar 15, 2026Updated last week
- Beautiful visualizations of how language differs among document types.☆2,329Apr 29, 2025Updated 10 months ago
- python package to calculate readability statistics of a text object - paragraphs, sentences, articles.☆1,356Feb 18, 2026Updated last month
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,201Mar 10, 2026Updated 2 weeks ago
- 🦆 Contextually-keyed word vectors☆1,672Apr 23, 2025Updated 11 months ago
- German lemmatization with IWNLP as extension for spaCy☆27Jul 28, 2023Updated 2 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Entity linker for the newspaper collection of the National Library of the Netherlands. Links named entity mentions to DBpedia description…☆11Dec 8, 2022Updated 3 years ago
- Text readability metrics in Python.☆11Aug 29, 2013Updated 12 years ago
- Multilingual text (NLP) processing toolkit☆2,369Nov 10, 2023Updated 2 years ago
- ☆1,319Jul 18, 2022Updated 3 years ago
- A modern, interlingual wordnet interface for Python☆290Updated this week
- Python implementation of TextRank algorithms ("textgraphs") for phrase extraction☆2,208Feb 15, 2026Updated last month
- NLP, before and after spaCy☆2,237Sep 22, 2023Updated 2 years ago
- SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm☆3,389Jan 20, 2026Updated 2 months ago
- This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of th…☆4,346May 17, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.☆908Aug 20, 2024Updated last year
- An LL parser for extracting information from Wiki text, particularly Wiktionary.☆50Aug 16, 2023Updated 2 years ago
- A minimal, pure Python library to interface with CoNLL-U format files.☆154Dec 5, 2025Updated 3 months ago
- 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows☆12,322Updated this week
- ☆15Mar 2, 2026Updated 3 weeks ago
- Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.☆632Jun 24, 2021Updated 4 years ago
- A very simple framework for state-of-the-art Natural Language Processing (NLP)☆14,352Oct 27, 2025Updated 4 months ago
- 🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy☆336Apr 25, 2025Updated 11 months ago
- Hy-phen-ation made easy☆220Jan 5, 2026Updated 2 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Code for building ConceptNet from raw data.☆2,932Jan 19, 2023Updated 3 years ago
- 📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.☆3,525Apr 18, 2025Updated 11 months ago
- Toolkit to help understand "what lies" in word embeddings. Also benchmarking!☆473Feb 6, 2023Updated 3 years ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.☆32,190Sep 30, 2025Updated 5 months ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆171Dec 15, 2021Updated 4 years ago
- ✔️Contextual word checker for better suggestions (not actively maintained)☆419Jan 31, 2025Updated last year
- A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)☆802Mar 8, 2026Updated 2 weeks ago