Softcatala / julibertLinks
Catalan bert model
☆12Updated 4 years ago
Alternatives and similar repositories for julibert
Users that are interested in julibert are comparing it to the libraries listed below
Sorting:
- Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern stri…☆29Updated 3 years ago
- SIGMORPHON 2022 Shared Task on Morpheme Segmentation☆26Updated 2 years ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆47Updated 2 years ago
- ☆42Updated 3 years ago
- ☆49Updated 11 months ago
- Linguistic processing for Common Voice☆55Updated last year
- Automatically exported from code.google.com/p/m2m-aligner☆42Updated 9 years ago
- Gamma Agreement in Python☆44Updated last year
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.☆76Updated last year
- SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages☆9Updated last year
- The central repo for Creole based NLU and NLG work☆18Updated 2 months ago
- Deepspeech ASR Model for the Catalan Language☆17Updated 4 years ago
- Many ASRs under one roof. With Benchmarking... answering the question. What is the best ASR for my dataset?☆19Updated 2 years ago
- Phonetically-Oriented Word Error Rate☆35Updated 6 years ago
- Bicleaner fork that uses neural networks☆40Updated last month
- Complimentary code for our paper Automatic punctuation restoration with BERT models☆50Updated last year
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆74Updated 3 months ago
- Morphological Inflection for Low-Resource Languages using cross-lingual transfer☆20Updated 5 years ago
- Creating super-parallel corpora of more than 1500+ unique languages for NLP research☆34Updated 2 years ago
- Repository for SLURP paper☆103Updated 3 years ago
- ☆44Updated 3 years ago
- A merged version of multiple open-source German speech datasets.☆31Updated last year
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆158Updated last year
- 📃Language Model based sentences scoring library☆308Updated 3 years ago
- Training an n-gram based Language Model using KenLM toolkit for Deep Speech 2☆114Updated 6 years ago
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆14Updated 8 years ago
- Compound splitter for German☆107Updated 5 years ago
- SHAS: Approaching optimal Segmentation for End-to-End Speech Translation☆38Updated 2 years ago
- ITALIC: An ITALian Intent Classification Dataset☆14Updated last year
- Repository for Vajjala & Lucic (2018)☆65Updated last year