rspeer / wordfreq
Access a database of word frequencies, in various natural languages.
☆1,434Updated last month
Alternatives and similar repositories for wordfreq:
Users that are interested in wordfreq are comparing it to the libraries listed below
- A modern, interlingual wordnet interface for Python☆233Updated last week
- The Open English WordNet☆505Updated 3 weeks ago
- A Python Wiktionary Parser☆358Updated last year
- A Python parser for MediaWiki wikicode☆780Updated last month
- English word segmentation, written in pure-Python, and based on a trillion-word corpus.☆372Updated 2 years ago
- Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.☆629Updated 3 years ago
- NLP, before and after spaCy☆2,215Updated last year
- Things you can do with the token embeddings of an LLM☆1,422Updated 2 weeks ago
- Fast implementation of the edit distance(Levenshtein distance)☆668Updated last year
- Stand-alone language identification system☆2,352Updated 5 years ago
- 📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.☆620Updated 8 months ago
- Rapid fuzzy string matching in Python using various string metrics☆2,893Updated 2 weeks ago
- Multilingual text (NLP) processing toolkit☆2,322Updated last year
- python package to calculate readability statistics of a text object - paragraphs, sentences, articles.☆1,182Updated last week
- spellchecking library for python☆606Updated 7 months ago
- Streaming WARC/ARC library for fast web archive IO☆398Updated 2 months ago
- Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.☆743Updated this week
- GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors☆498Updated 5 years ago
- Multilingual word vectors in 78 languages☆1,195Updated last year
- A simple interface to the Project Gutenberg corpus.☆323Updated 2 years ago
- Wiktionary dump file parser and multilingual data extractor☆855Updated this week
- A lightning fast Finite State machine and REgular expression manipulation library.☆1,834Updated 2 months ago
- Repository for Frequency Word List Generator and processed files☆1,215Updated 3 years ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- 🦆 Contextually-keyed word vectors☆1,637Updated 10 months ago
- Bitextor generates translation memories from multilingual websites☆293Updated 3 months ago
- Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.☆847Updated 3 weeks ago
- A vector search SQLite extension that runs anywhere!☆4,840Updated 3 weeks ago
- Module for automatic summarization of text documents and HTML pages.☆3,550Updated 9 months ago
- A simple interface for the CMU pronouncing dictionary☆305Updated 6 months ago