arahusky / diacritics_restoration
Neural based model for automatic diacritics restoration.
☆22Updated 6 years ago
Related projects ⓘ
Alternatives and complementary repositories for diacritics_restoration
- Small-vocabulary sequence-to-sequence generation with optional feature conditioning☆31Updated this week
- Compound splitter for German☆103Updated 4 years ago
- Repository for the word embeddings experiments described in "Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource", pre…☆82Updated 3 years ago
- Romanian Named Entity Corpus (RONEC) version 2.0☆60Updated 2 years ago
- Use Language Model (LM) for Grammar Error Correction (GEC), without the use of annotated data.☆80Updated 5 years ago
- Automatic extraction of edited sentences from text edition histories.☆81Updated 2 years ago
- ☆43Updated 3 months ago
- ☆40Updated 2 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆150Updated 5 months ago
- Training an n-gram based Language Model using KenLM toolkit for Deep Speech 2☆112Updated 5 years ago
- A tool that locates, downloads, and extracts machine translation corpora☆147Updated 5 months ago
- xfspell — the Transformer Spell Checker☆187Updated 4 years ago
- A guide to building language technology in new languages.☆57Updated 2 years ago
- 📃Language Model based sentences scoring library☆303Updated 2 years ago
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.☆72Updated last year
- spaCy + UDPipe☆161Updated 2 years ago
- A simple library for querying the URIEL typological database.☆88Updated 7 months ago
- LASER multilingual sentence embeddings as a pip package☆225Updated last year
- Efficient Low-Memory Aligner☆139Updated 2 months ago
- A Python 3 phonetics library.☆124Updated 4 years ago
- Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern stri…☆22Updated 2 years ago
- Massively multilingual pronunciation mining☆321Updated this week
- Improved Sentence Alignment in Linear Time and Space☆163Updated last year
- Morfessor is a tool for unsupervised and semi-supervised morphological segmentation☆185Updated 4 years ago
- Universal Romanizer that can convert any unicode script to roman (latin) script☆154Updated 3 months ago
- Corpus preprocessing☆95Updated 8 months ago
- The CODWOE shared task invites you to compare two types of semantic descriptions: dictionary glosses and word embedding representations. …☆11Updated 2 years ago
- Punctuation restoration and spell correction experiments.☆248Updated 3 years ago
- Python package and data files for manipulating phonological segments (phones, phonemes) in terms of universal phonological features.☆221Updated 3 months ago
- Python module for syllabifying English ARPABET transcriptions☆64Updated 5 years ago