rspeer / wordfreq
Access a database of word frequencies, in various natural languages.
☆699Updated 2 months ago
Related projects: ⓘ
- A Python Wiktionary Parser☆358Updated 8 months ago
- The Open English WordNet☆459Updated last week
- English word segmentation, written in pure-Python, and based on a trillion-word corpus.☆364Updated last year
- Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.☆616Updated 3 years ago
- python package to calculate readability statistics of a text object - paragraphs, sentences, articles.☆1,129Updated 3 months ago
- Wiktionary dump file parser and multilingual data extractor☆791Updated this week
- A modern, interlingual wordnet interface for Python☆207Updated 9 months ago
- Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm…☆791Updated 2 weeks ago
- A python module for English lemmatization and inflection.☆258Updated last year
- Heuristic based boilerplate removal tool☆717Updated 4 months ago
- NLP, before and after spaCy☆2,206Updated 11 months ago
- Python stemming library using snowball stemmers☆242Updated 2 weeks ago
- Port of Google's language-detection library to Python.☆1,709Updated 7 months ago
- Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.☆1,060Updated last year
- 🦆 Contextually-keyed word vectors☆1,617Updated 6 months ago
- Streaming WARC/ARC library for fast web archive IO☆369Updated 3 weeks ago
- SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm☆3,121Updated 5 months ago
- Multilingual text (NLP) processing toolkit☆2,307Updated 10 months ago
- Just the facts -- web page content extraction☆1,244Updated 2 months ago
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆358Updated last week
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆225Updated last year
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆139Updated last month
- A Python parser for MediaWiki wikicode☆742Updated 2 months ago
- Compact Language Detector 2☆836Updated 3 years ago
- A Python library to parse MediaWiki WikiText☆285Updated last month
- Multilingual word vectors in 78 languages☆1,193Updated last year
- A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)☆630Updated 3 weeks ago
- Gather modern English word frequencies from all enwiki articles.☆198Updated 6 months ago
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,040Updated 2 weeks ago
- All languages stopwords collection☆420Updated 8 months ago