harshnative / words-dataset
over 6_00_000 english words data set arranged with each words frequency
☆15Updated 3 years ago
Alternatives and similar repositories for words-dataset:
Users that are interested in words-dataset are comparing it to the libraries listed below
- English Lemma Database - Compiled by Referencing British National Corpus☆30Updated 7 months ago
- Gather modern English word frequencies from all enwiki articles.☆212Updated last year
- JavaScript port of SymSpell for Node.js☆13Updated 2 years ago
- Offline bilingual dictionaries made using data from Wiktionary☆54Updated 10 years ago
- RosaeNLG is a Natural Language Generation library for node.js and browser rendering, based on the Pug template engine.☆99Updated 3 months ago
- CLDR text segmentation for JavaScript☆38Updated 11 months ago
- A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning tech…☆72Updated 4 months ago
- Analyzes the given text and determine what's the vocabulary level based on CEFR levels☆45Updated 2 years ago
- NLP system for predicting the reading difficulty level of a text in terms of its CEFR level.☆52Updated 4 months ago
- An NLP pipeline for Hebrew☆37Updated last month
- Tool to generate paraphrases of sentences in many languages.☆84Updated 3 years ago
- Parse and convert numbers written in French, English, Spanish, Portuguese, German and Catalan into their digit representation.☆106Updated 2 months ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆154Updated 5 months ago
- Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code☆33Updated 2 months ago
- Transliteration for languages and dialects☆43Updated 2 years ago
- Pronunciation dictionaries for several languages, based on Wiktionary data.☆20Updated 3 years ago
- PyMultiDictionary is a dictionary module that gets meanings, translations, synonyms, and antonyms of words in 20 different languages☆50Updated last week
- A text file containing English words, along with the definition, parts of speech (noun,verb,adjective,etc.), and a link to the url where …☆12Updated 11 months ago
- The 134,000+ words and their pronunciations in the CMU pronouncing dictionary☆77Updated 3 years ago
- An NLP library for Uralic languages such as Finnish, Skolt Sami, Moksha and so on. Also supporting some non-Uralic languages such as Span…☆78Updated 4 months ago
- Multilingual syllable annotation pipeline component for spacy☆39Updated 2 years ago
- Simple word to frequency mappings for the german language based on text corpora and using CISTEM stemmer.☆12Updated 4 years ago
- Lightweight string similarity function for javascript☆100Updated last year
- A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning tech…☆75Updated 7 months ago
- Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code☆64Updated last year
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆244Updated 2 years ago
- A modern, interlingual wordnet interface for Python☆243Updated last week
- Grammalecte, le correcteur grammatical en Python☆18Updated 4 months ago
- JS Trie / DAWG classes☆30Updated last year
- 🏆 • 5050 most frequent words in 109 languages☆42Updated 2 years ago