Access a database of word frequencies, in various natural languages.
☆1,633Jan 4, 2025Updated last year
Alternatives and similar repositories for wordfreq
Users that are interested in wordfreq are comparing it to the libraries listed below
Sorting:
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆59Jul 1, 2021Updated 4 years ago
- Fixes mojibake and other glitches in Unicode text, after the fact.☆4,013Oct 30, 2024Updated last year
- 💫 Industrial-strength Natural Language Processing (NLP) in Python☆33,283Updated this week
- PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, an…☆477Sep 14, 2023Updated 2 years ago
- The Open English WordNet☆734Feb 4, 2026Updated last month
- 🦆 Contextually-keyed word vectors☆1,673Apr 23, 2025Updated 10 months ago
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,193Dec 15, 2025Updated 2 months ago
- Multilingual text (NLP) processing toolkit☆2,366Nov 10, 2023Updated 2 years ago
- Beautiful visualizations of how language differs among document types.☆2,331Apr 29, 2025Updated 10 months ago
- 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows☆12,247Feb 25, 2026Updated last week
- Wiktionary dump file parser and multilingual data extractor☆1,113Feb 27, 2026Updated last week
- ☆1,316Jul 18, 2022Updated 3 years ago
- Python implementation of TextRank algorithms ("textgraphs") for phrase extraction☆2,209Feb 15, 2026Updated 2 weeks ago
- python package to calculate readability statistics of a text object - paragraphs, sentences, articles.☆1,352Feb 18, 2026Updated 2 weeks ago
- Entity linker for the newspaper collection of the National Library of the Netherlands. Links named entity mentions to DBpedia description…☆11Dec 8, 2022Updated 3 years ago
- NLP, before and after spaCy☆2,235Sep 22, 2023Updated 2 years ago
- A minimal, pure Python library to interface with CoNLL-U format files.☆153Dec 5, 2025Updated 3 months ago
- SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm☆3,382Jan 20, 2026Updated last month
- 🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.☆904Aug 20, 2024Updated last year
- rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.☆9,468Jan 20, 2026Updated last month
- An open source multi-tool for exploring and publishing data☆10,805Feb 26, 2026Updated last week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)☆14,354Oct 27, 2025Updated 4 months ago
- This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of th…☆4,323May 17, 2023Updated 2 years ago
- A modern, interlingual wordnet interface for Python☆286Updated this week
- German lemmatization with IWNLP as extension for spaCy☆26Jul 28, 2023Updated 2 years ago
- Toolkit to help understand "what lies" in word embeddings. Also benchmarking!☆475Feb 6, 2023Updated 3 years ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.☆32,170Sep 30, 2025Updated 5 months ago
- Module for automatic summarization of text documents and HTML pages.☆3,662Feb 14, 2026Updated 2 weeks ago
- Library for fast text representation and classification.☆26,501Mar 22, 2024Updated last year
- newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:☆14,997Dec 6, 2025Updated 3 months ago
- 📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.☆3,517Apr 18, 2025Updated 10 months ago
- A tool for learning vector representations of words and entities from Wikipedia☆964May 3, 2024Updated last year
- Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk☆14,169Oct 29, 2025Updated 4 months ago
- Persistent dict, backed by sqlite3 and pickle, multithread-safe.☆1,245Dec 7, 2022Updated 3 years ago
- Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.☆633Jun 24, 2021Updated 4 years ago
- just a bunch of useful embeddings for scikit-learn pipelines☆522Feb 12, 2026Updated 3 weeks ago
- Correctly generate plurals, ordinals, indefinite articles; convert numbers to words☆1,067May 14, 2025Updated 9 months ago
- Hy-phen-ation made easy☆219Jan 5, 2026Updated 2 months ago
- Unsupervised text tokenizer for Neural Network-based text generation.☆11,677Updated this week