Softcatala / julibert
Catalan bert model
☆12Updated 4 years ago
Alternatives and similar repositories for julibert:
Users that are interested in julibert are comparing it to the libraries listed below
- Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern stri…☆22Updated 2 years ago
- ☆42Updated 3 years ago
- Gamma Agreement in Python☆43Updated 10 months ago
- SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages☆8Updated 11 months ago
- An initiative to collect and distribute resources for co-reference resolution in a unified standard.☆24Updated 8 months ago
- Compiled tools, datasets, and other resources for historical text normalization.☆16Updated 5 years ago
- This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings l…☆22Updated 2 years ago
- Deepspeech ASR Model for the Catalan Language☆17Updated 3 years ago
- A repository for the 2022 Inflection Shared Task☆9Updated 2 years ago
- LOW-RESOURCE NEURAL MACHINE TRANSLATION: A BENCHMARK FOR FIVE AFRICAN LANGUAGES☆15Updated 4 years ago
- SIGMORPHON 2022 Shared Task on Morpheme Segmentation☆24Updated last year
- coFR: COreference resolution tool for FRench (and singletons).☆24Updated 4 years ago
- ☆44Updated 2 years ago
- ☆64Updated last year
- ☆23Updated 4 years ago
- Linguistic and stylistic complexity measures for (literary) texts☆79Updated 11 months ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆38Updated last year
- ☆15Updated 3 years ago
- Easier Automatic Sentence Simplification Evaluation☆160Updated last year
- Python Finite-State Toolkit☆47Updated last week
- A guide to building language technology in new languages.☆58Updated 2 years ago
- ☆35Updated 2 years ago
- A tiny BERT for low-resource monolingual models☆31Updated 3 months ago
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.☆72Updated last year
- A survey of corpora for Germanic low-resource languages and dialects☆24Updated last month
- A french sequence to sequence pretrained model☆57Updated 2 years ago
- ☆44Updated 5 months ago
- Wav2Vec 2.0 catalan training scripts and models☆12Updated 3 years ago
- A tokenizer and sentence splitter for German and English web and social media texts.☆137Updated last month
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated 5 months ago