Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern string search
☆35Jul 7, 2022Updated 3 years ago
Alternatives and similar repositories for german_compound_splitter
Users that are interested in german_compound_splitter are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Compound splitter for German☆113Apr 5, 2020Updated 6 years ago
- Repo for the simplified text alignment tools.☆21Dec 4, 2020Updated 5 years ago
- Adnabod lleferydd Cymraeg i'r Gymraeg gyda HuggingFace // Speech Recognition for Welsh with HuggingFace☆13Nov 29, 2022Updated 3 years ago
- Alignment and annotation for comparable documents.☆22Oct 16, 2018Updated 7 years ago
- ☆12Jan 27, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Python module to clean and transliterate (i.e. normalize) German text including abbreviations, numbers, timestamps etc. It can be used to…☆37Jan 16, 2021Updated 5 years ago
- A tokenizer and sentence splitter for German and English web and social media texts.☆153Dec 9, 2024Updated last year
- IPA Phonetic dataset lexicon☆18Mar 20, 2026Updated 3 weeks ago
- Legal Reference Extraction☆45Feb 13, 2026Updated 2 months ago
- A lemmatizer for German language text☆94Feb 7, 2023Updated 3 years ago
- 🫠 check your data, before you wreck your model☆16Aug 11, 2022Updated 3 years ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆29Apr 17, 2024Updated last year
- Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German☆518Oct 30, 2024Updated last year
- The source code for the TIRA Shared Task Platform☆17Apr 8, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- XWikisCorpus, cross-lingual summarisation, multi-lingual summarisation, pre-trained language models, zero-shot and few-shot summarisation…☆10Nov 4, 2022Updated 3 years ago
- A C filesystem library designed for embedded devices with several kBytes RAM☆21Feb 12, 2013Updated 13 years ago
- Awesome stuff made by the Mycroft community☆13Sep 16, 2021Updated 4 years ago
- "Learning Rhyming Constraints using Structured Adversaries. Jhamtani H., Mehta S., Carbonell J., Berg-Kirkpatrick T. EMNLP-IJCNLP (Short …☆11Mar 17, 2020Updated 6 years ago
- Wikipedia text corpus for self-supervised NLP model training☆46Jul 17, 2022Updated 3 years ago
- Download, parse, and filter data from Phil Papers. Data-ready for The-Pile.☆19Aug 28, 2023Updated 2 years ago
- Wrapper for the yr.no weather service API.☆15Apr 12, 2018Updated 8 years ago
- Building an effective preprocessing tool for African languages☆12Jan 24, 2024Updated 2 years ago
- The NLPStatTest project☆12Mar 12, 2022Updated 4 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Poems retrieval demo built with GNES framework☆14Oct 3, 2019Updated 6 years ago
- ☆11Dec 8, 2022Updated 3 years ago
- TSAR2022 Shared Task on Lexical Simplification - Datasets and Evaluation scripts☆10Oct 27, 2022Updated 3 years ago
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆13Aug 10, 2023Updated 2 years ago
- text-to-speech alignment java software☆20Aug 25, 2019Updated 6 years ago
- Simple word to frequency mappings for the german language based on text corpora and using CISTEM stemmer.☆14Apr 3, 2021Updated 5 years ago
- A rolling version of the Latent Dirichlet Allocation.☆13Nov 27, 2023Updated 2 years ago
- Interface for using TTS and vocoder models in the form of a text editor☆19Nov 25, 2025Updated 4 months ago
- Poetry Corpora Annotated on Aesthetic Emotions☆12Aug 2, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Code to create the dataset from "A New Aligned Simple German Corpus☆12Jan 8, 2024Updated 2 years ago
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆29Apr 27, 2024Updated last year
- Scripts to simplify data prepping for Mozilla DeepSpeech.☆14Aug 6, 2019Updated 6 years ago
- X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents (JCDL 2022)☆14Jul 22, 2022Updated 3 years ago
- Very old C compilers☆27Aug 12, 2014Updated 11 years ago
- Open German WordNet☆100Jan 7, 2026Updated 3 months ago
- Home surveillance system with facial recognition☆17Jun 10, 2020Updated 5 years ago