olastor / german-word-frequencies
Simple word to frequency mappings for the german language based on text corpora and using CISTEM stemmer.
☆11Updated 3 years ago
Alternatives and similar repositories for german-word-frequencies:
Users that are interested in german-word-frequencies are comparing it to the libraries listed below
- Morphological Dictionaries for German Language☆28Updated 6 years ago
- Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code☆26Updated this week
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆46Updated last year
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆27Updated 3 years ago
- A library for fetching and reading Tatoeba's weekly exports☆21Updated last year
- Cog is a tool for comparing languages using lexicostatistics and comparative linguistics techniques.☆23Updated last year
- Audiobook alignment for Indigenous languages☆38Updated this week
- The Unicode Cookbook for Linguists☆53Updated 4 years ago
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- An even smaller speech recognizer / force aligner☆32Updated last month
- Aksharamukha Python Library☆44Updated 3 months ago
- Post-processing OCR errors with seq2seq models☆28Updated 4 years ago
- A text file containing English words, along with the definition, parts of speech (noun,verb,adjective,etc.), and a link to the url where …☆10Updated 9 months ago
- A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.☆35Updated this week
- Code for the paper: Wikinflection: Massive semi-supervised generation of multilingual inflectional corpus from Wiktionary (Metheniti and …☆9Updated 4 years ago
- Unicode Standard tokenization routines and orthography profile segmentation☆34Updated 2 years ago
- Multilingual tokenizer that automatically tags each token with its type☆61Updated last year
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆30Updated last year
- The Data Format for Digital Linguistics (DaFoDiL)☆22Updated last year
- Open Source AI Benchmarking toolkit for benchmarking speech to text services☆55Updated 9 months ago
- Tools for scraping, annotating, and parsing morphological information from Wiktionary☆13Updated 5 years ago
- A list of vocabulary lists☆21Updated 4 years ago
- German Morphological Analyzer☆47Updated 3 years ago
- These are lists for a variety of languages containing words that are distinctive to each language.☆35Updated 2 years ago
- An NLP library for Uralic languages such as Finnish, Skolt Sami, Moksha and so on. Also supporting some non-Uralic languages such as Span…☆73Updated 2 months ago
- Pronunciation dictionaries for several languages, based on Wiktionary data.☆18Updated 3 years ago
- The Wikinflection Corpus, from the paper "Wikinflection Corpus: A (Better) Multilingual, Morpheme-Annotated Inflectional Corpus" (Metheni…☆12Updated last year
- Finite state and Constraint Grammar based analysers and proofing tools, and language resources for the Plains Cree language☆15Updated this week
- Best Practices in Translation Memory Management☆45Updated 6 years ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago