AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.
☆37Feb 1, 2012Updated 14 years ago
Alternatives and similar repositories for AutoCorpus
Users that are interested in AutoCorpus are comparing it to the libraries listed below
Sorting:
- This is application for dysarthria to improve their pronunciation by using deep learning☆10Dec 29, 2020Updated 5 years ago
- ☆13Nov 16, 2022Updated 3 years ago
- Perform the forced decoding with target transcription☆11Sep 12, 2018Updated 7 years ago
- steps to perform text-based speaker diarization with kaldi toolkit☆12Nov 2, 2018Updated 7 years ago
- An app that graphs and compares the pitch contours of spoken language, to help language learners perfect their intonation (Hackbright Spr…☆30Jul 20, 2017Updated 8 years ago
- Speech Processing & Linguistic Analysis Tool☆11Jun 30, 2019Updated 6 years ago
- ☆17Apr 28, 2021Updated 4 years ago
- my internet website and web blog☆17Jul 18, 2025Updated 7 months ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Apr 10, 2014Updated 11 years ago
- Coqui STT (🐸STT) based forced alignment tool☆13Feb 24, 2022Updated 4 years ago
- Grapheme to phoneme toolkit using joint-modelling + CRFs in java☆14Jul 14, 2018Updated 7 years ago
- This is a mirror of https://gitlab.com/tiro-is/tiro-speech-core☆15Jun 19, 2023Updated 2 years ago
- Tools for working with the CMU Pronunciation Dictionary☆36Sep 5, 2017Updated 8 years ago
- Easier analysis of large speech corpora☆23Jun 22, 2021Updated 4 years ago
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆14Jan 24, 2017Updated 9 years ago
- Calculate remaining reading time estimates in real-time☆24Sep 4, 2014Updated 11 years ago
- Phonetic and phonological vocoding platform☆17Nov 23, 2016Updated 9 years ago
- A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.☆15May 19, 2020Updated 5 years ago
- bilingual dictionary extractor from parallel corpora☆23Jul 3, 2014Updated 11 years ago
- A proxy service to retrieve POIs (Points Of Interest) from several public services (Nominatim, Mapquest, Cloudmade, Geonames, Panoramio, …☆28May 20, 2022Updated 3 years ago
- Deploy Kaldi models using grpc for bidirectional streaming.☆17Sep 30, 2024Updated last year
- wake word spotting with kaldi☆19Dec 3, 2020Updated 5 years ago
- The Kyoyo Language Modeling Toolkit☆27Nov 27, 2014Updated 11 years ago
- Microsoft Speech Language Translation (MSLT) Corpus☆19Sep 18, 2017Updated 8 years ago
- BurrMill core☆22Nov 2, 2021Updated 4 years ago
- A handy dataset of noises for ASR☆22May 29, 2019Updated 6 years ago
- DKPro WSD: A Java framework for word sense disambiguation☆20Nov 16, 2022Updated 3 years ago
- Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-gramma…☆21Jan 24, 2022Updated 4 years ago
- 📖 LanMIT: A Toolkit for Improving Language Models in Low-resourced Speech Recognition based on Kaldi.☆22Jul 12, 2019Updated 6 years ago
- Java interfaces and tools for Kaldi speech recognition.☆20Oct 2, 2016Updated 9 years ago
- ☆22Jul 8, 2021Updated 4 years ago
- A free & open tool for transcribing audio interviews with offline ASR support☆25Dec 21, 2023Updated 2 years ago
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Feb 15, 2024Updated 2 years ago
- Implicit relation extractor using a natural language model.☆24May 25, 2018Updated 7 years ago
- ☆25Jun 14, 2022Updated 3 years ago
- Generalized Language Modeling toolkit☆52Jun 21, 2022Updated 3 years ago
- Audio Diarization Annotation tool☆30Nov 8, 2019Updated 6 years ago
- Compute association strength over semantic networks in a dimensionality-reduced form.☆32Aug 14, 2015Updated 10 years ago
- BBB plugin for automatic subtitles in conference calls☆29Apr 14, 2022Updated 3 years ago