google-research / nisabaLinks
Finite-state script normalization and processing utilities
☆40Updated 2 weeks ago
Alternatives and similar repositories for nisaba
Users that are interested in nisaba are comparing it to the libraries listed below
Sorting:
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 2 years ago
- phone inventory library☆16Updated 2 years ago
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.☆75Updated last year
- NTREX -- News Test References for MT Evaluation☆83Updated last year
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler☆24Updated 4 years ago
- Unicode Standard tokenization routines and orthography profile segmentation☆37Updated 3 months ago
- A tiny BERT for low-resource monolingual models☆31Updated 8 months ago
- This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to…☆45Updated 4 years ago
- 🕸 GlotWeb: Web Indexing for Low-Resource Languages -- under construction.☆13Updated 2 months ago
- Morfessor EM+Prune☆10Updated 4 years ago
- asr2k☆50Updated last year
- SIGMORPHON 2020 Shared Task: Grapheme-to-Phoneme, Unsupervised Induction of Morphology, and Typologically Diverse Morphological Inflectio…☆36Updated last month
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Updated last year
- A guide to building language technology in new languages.☆58Updated 3 years ago
- Read-only unofficial mirror of the OpenGrm NGram Library☆8Updated 6 years ago
- Multilingual Open Text☆25Updated 3 weeks ago
- A library for data streaming and augmentation☆20Updated last month
- Unsupervised spoken sentence embeddings☆14Updated 2 years ago
- ☆56Updated 2 years ago
- Kaldi style neural network training in pytorch for use in place of nnet3 in Kaldi.☆26Updated 10 months ago
- Large scale (>200h) and publicly available read audio book corpus. This corpus is an augmentation of LibriSpeech ASR Corpus (1000h) and c…☆43Updated 2 years ago
- A JAX library for building lattice-based speech transducer models☆44Updated 5 months ago
- Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together☆47Updated 2 years ago
- Read-only unofficial mirror of OpenFst☆44Updated 3 years ago
- Extracts plain text, language identification and more metadata from WARC records☆22Updated 3 months ago
- Scripts to create speech corpora from open.bible☆13Updated 3 years ago
- Repository for sharing the data in the Tamasheq language, one of the target languages for the low-resource speech translation track at IW…☆18Updated 2 years ago
- Sequence algorithms for use in Flashlight.☆14Updated 2 months ago
- This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The tex…☆53Updated 4 years ago
- MAMMOTH: MAssively Multilingual Modular Open Translation @ Helsinki☆23Updated 3 months ago