google-research / nisaba
Finite-state script normalization and processing utilities
☆38Updated this week
Alternatives and similar repositories for nisaba:
Users that are interested in nisaba are comparing it to the libraries listed below
- LTG-Bert☆29Updated last year
- A tiny BERT for low-resource monolingual models☆31Updated 3 months ago
- A JAX library for building lattice-based speech transducer models☆41Updated last month
- Unicode Standard tokenization routines and orthography profile segmentation☆34Updated 2 years ago
- phone inventory library☆16Updated last year
- A guide to building language technology in new languages.☆58Updated 2 years ago
- ☆56Updated 2 years ago
- scipts for working with open.bible data☆24Updated 2 years ago
- Proposed splits for the LREC Wikipron paper☆13Updated 4 years ago
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler☆23Updated 3 years ago
- Suite for phonetic word embeddings, especially their evaluation and baseline models.☆24Updated 2 months ago
- 🫠 check your data, before you wreck your model☆16Updated 2 years ago
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.☆72Updated last year
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆24Updated 9 months ago
- NTREX -- News Test References for MT Evaluation☆80Updated 7 months ago
- ☆74Updated 3 years ago
- Scripts to create speech corpora from open.bible☆12Updated 3 years ago
- MAMMOTH: MAssively Multilingual Modular Open Translation @ Helsinki☆22Updated last month
- Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to E…☆20Updated 2 years ago
- Curriculum training☆16Updated this week
- Unsupervised spoken sentence embeddings☆14Updated 2 years ago
- Bicleaner fork that uses neural networks☆39Updated 5 months ago
- Second SIGMORPHON Shared Task on Grapheme-to-Phoneme Conversions☆22Updated 3 years ago
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 2 years ago
- Ukrainian ELECTRA model☆12Updated last year
- Multilingual Open Text☆25Updated 2 months ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆49Updated this week
- A library for data streaming and augmentation☆20Updated 10 months ago
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Updated 11 months ago
- Morfessor EM+Prune☆10Updated 4 years ago