olastor / german-word-frequencies
Simple word to frequency mappings for the german language based on text corpora and using CISTEM stemmer.
☆12Updated 3 years ago
Alternatives and similar repositories for german-word-frequencies:
Users that are interested in german-word-frequencies are comparing it to the libraries listed below
- A library for fetching and reading Tatoeba's weekly exports☆22Updated last year
- An NLP pipeline for Hebrew☆36Updated last week
- Script for workflow to add morphological analysis into ELAN files☆13Updated 4 years ago
- Morphological Dictionaries for German Language☆28Updated 6 years ago
- Python module for syllabifying English ARPABET transcriptions☆66Updated 6 years ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆27Updated 3 years ago
- 🏆 • 5050 most frequent words in 109 languages☆42Updated 2 years ago
- German lemmatization with IWNLP as extension for spaCy☆24Updated last year
- Automatic Speech Recognition (ASR) - German☆21Updated 5 years ago
- Code for the paper: Wikinflection: Massive semi-supervised generation of multilingual inflectional corpus from Wiktionary (Metheniti and …☆9Updated 4 years ago
- linguistics backend☆41Updated last year
- A Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format☆33Updated 5 years ago
- German part-of-speech dictionary☆43Updated last year
- Open Source AI Benchmarking toolkit for benchmarking speech to text services☆55Updated 10 months ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆153Updated 3 months ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆31Updated this week
- PurePos is an open source hybrid morphological tagger.☆16Updated 4 years ago
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆47Updated last year
- Cog is a tool for comparing languages using lexicostatistics and comparative linguistics techniques.☆23Updated last year
- Plan and train German transformer models.☆23Updated 4 years ago
- ☆22Updated 2 years ago
- Unicode Standard tokenization routines and orthography profile segmentation☆35Updated 3 weeks ago
- Audiobook alignment for Indigenous languages☆38Updated 2 weeks ago
- Python Multilingual Ucrel Semantic Analysis System☆31Updated 6 months ago
- A baseline Automatic Speech Recognition system for Polish based on Kaldi.☆18Updated 3 years ago
- Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼☆22Updated last month
- German Morphological Analyzer☆47Updated 3 years ago
- Parse and convert numbers written in French, English, Spanish, Portuguese, German and Catalan into their digit representation.☆105Updated last month
- python code for converting among IPA, ARPABET, XSAMPA, Callhome, DISC, TIMIT, plus some lexical tones.☆33Updated last year