olastor / german-word-frequenciesLinks
Simple word to frequency mappings for the german language based on text corpora and using CISTEM stemmer.
☆12Updated 4 years ago
Alternatives and similar repositories for german-word-frequencies
Users that are interested in german-word-frequencies are comparing it to the libraries listed below
Sorting:
- Character-level conversion between Hebrew text and Latin transliteration using deep learning - a demonstration of seq2seq training.☆13Updated 2 years ago
- Aksharamukha Python Library☆50Updated 5 months ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆103Updated last month
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆30Updated 2 weeks ago
- An NLP library for Uralic languages such as Finnish, Skolt Sami, Moksha and so on. Also supporting some non-Uralic languages such as Span…☆81Updated last month
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆49Updated 2 years ago
- An NLP pipeline for Hebrew☆38Updated last month
- ☆74Updated 3 months ago
- A library for fetching and reading Tatoeba's weekly exports☆23Updated last year
- A character-wise tokenizer for morphologically rich languages☆27Updated 4 months ago
- A French Lemmatizer in Python based on the LEFFF☆42Updated 5 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆166Updated last month
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆31Updated 4 months ago
- The Data Format for Digital Linguistics (DaFoDiL)☆22Updated 2 years ago
- Audiobook alignment for Indigenous languages☆40Updated last week
- A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the dat …☆153Updated 6 months ago
- Multilingual syllable annotation pipeline component for spacy☆39Updated 2 years ago
- A code for transliterating (romanizing) Arabic text using the American Library Association - Library of Congress (ALA-LC) standard☆47Updated 3 years ago
- Script for workflow to add morphological analysis into ELAN files☆13Updated 5 years ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆51Updated last week
- Massively multilingual pronunciation mining☆344Updated last month
- An advanced, extensible web front-end for the Manatee-open corpus search engine☆70Updated last week
- A cloud-based, open-source system for writing and publishing dictionaries.☆93Updated last year
- Finite state and Constraint Grammar based analysers and proofing tools, and language resources for the Plains Cree language☆16Updated last week
- A list of vocabulary lists☆21Updated 5 years ago
- The Metadata Editor for Transparent Archiving of language document materials☆20Updated 2 months ago
- Open Source AI Benchmarking toolkit for benchmarking speech to text services☆56Updated last year
- Benchmark Arabic text diacritization dataset☆75Updated 5 years ago
- A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more …☆113Updated last year
- Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code☆37Updated 5 months ago