Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern string search
☆35Jul 7, 2022Updated 3 years ago
Alternatives and similar repositories for german_compound_splitter
Users that are interested in german_compound_splitter are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Compound splitter for German☆113Apr 5, 2020Updated 6 years ago
- Repo for the simplified text alignment tools.☆21Dec 4, 2020Updated 5 years ago
- Lyrics crawling, pre-processing, embedding generation, model training, and lyrics generation - all in one tool☆14Nov 4, 2018Updated 7 years ago
- Adnabod lleferydd Cymraeg i'r Gymraeg gyda HuggingFace // Speech Recognition for Welsh with HuggingFace☆13Nov 29, 2022Updated 3 years ago
- Alignment and annotation for comparable documents.☆22Oct 16, 2018Updated 7 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- COGS 543 - Computational Semantics☆15Jan 28, 2024Updated 2 years ago
- ☆13Apr 13, 2021Updated 5 years ago
- ☆12Jan 27, 2026Updated 3 months ago
- Python module to clean and transliterate (i.e. normalize) German text including abbreviations, numbers, timestamps etc. It can be used to…☆38Jan 16, 2021Updated 5 years ago
- A tokenizer and sentence splitter for German and English web and social media texts.☆152Dec 9, 2024Updated last year
- Coding utilities for quantitative legal studies☆14Dec 7, 2025Updated 5 months ago
- Python code to automatically produce a summary of a piece of text.☆12Sep 8, 2016Updated 9 years ago
- Extension for pie to include taggers with their models and pre/postprocessors☆11May 30, 2024Updated last year
- IPA Phonetic dataset lexicon☆18May 10, 2026Updated 2 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆14Jul 26, 2021Updated 4 years ago
- German Language Understanding Evaluation Benchmark @NAACL24☆23Dec 11, 2025Updated 5 months ago
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 3 years ago
- Legal Reference Extraction☆47May 12, 2026Updated 2 weeks ago
- A lemmatizer for German language text☆95Feb 7, 2023Updated 3 years ago
- A neural network hyphenator for the German language☆45Oct 25, 2023Updated 2 years ago
- 🫠 check your data, before you wreck your model☆16Aug 11, 2022Updated 3 years ago
- Coqui Inference Engine☆41Aug 3, 2021Updated 4 years ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆29Apr 17, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- The source code for the TIRA Shared Task Platform☆17May 15, 2026Updated last week
- XWikisCorpus, cross-lingual summarisation, multi-lingual summarisation, pre-trained language models, zero-shot and few-shot summarisation…☆10Nov 4, 2022Updated 3 years ago
- Awesome stuff made by the Mycroft community☆13Sep 16, 2021Updated 4 years ago
- ☆14Aug 9, 2024Updated last year
- "Learning Rhyming Constraints using Structured Adversaries. Jhamtani H., Mehta S., Carbonell J., Berg-Kirkpatrick T. EMNLP-IJCNLP (Short …☆11Mar 17, 2020Updated 6 years ago
- Wikipedia text corpus for self-supervised NLP model training☆46Jul 17, 2022Updated 3 years ago
- GraphOfDocs: Representing multiple documents as a single graph☆21Jun 22, 2022Updated 3 years ago
- Download, parse, and filter data from Phil Papers. Data-ready for The-Pile.☆20Aug 28, 2023Updated 2 years ago
- Automatic Limerick Generation☆11Mar 18, 2021Updated 5 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- The NLPStatTest project☆12Mar 12, 2022Updated 4 years ago
- ☆11Dec 8, 2022Updated 3 years ago
- TSAR2022 Shared Task on Lexical Simplification - Datasets and Evaluation scripts☆10Oct 27, 2022Updated 3 years ago
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆13Aug 10, 2023Updated 2 years ago
- ⚙️ Das Backend zu OffeneGesetze.de☆25Jan 11, 2024Updated 2 years ago
- Small-vocabulary neural sequence-to-sequence generation with optional feature conditioning☆36Updated this week
- Simple word to frequency mappings for the german language based on text corpora and using CISTEM stemmer.☆14Apr 3, 2021Updated 5 years ago