AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.
☆37Feb 1, 2012Updated 14 years ago
Alternatives and similar repositories for AutoCorpus
Users that are interested in AutoCorpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- natural language processing with link-grammar☆17Sep 30, 2009Updated 16 years ago
- This will hold the crowdsourcing platform to be used to store voice data from various speakers which will act as input dataset for speech…☆17Mar 6, 2023Updated 3 years ago
- Perform the forced decoding with target transcription☆11Sep 12, 2018Updated 7 years ago
- steps to perform text-based speaker diarization with kaldi toolkit☆12Nov 2, 2018Updated 7 years ago
- Coqui STT (🐸STT) based forced alignment tool☆13Feb 24, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆18Apr 28, 2021Updated 5 years ago
- This is a mirror of https://gitlab.com/tiro-is/tiro-speech-core☆15Jun 19, 2023Updated 2 years ago
- Speech Processing & Linguistic Analysis Tool☆11Jun 30, 2019Updated 6 years ago
- Easier analysis of large speech corpora☆24Jun 22, 2021Updated 4 years ago
- Grapheme to phoneme toolkit using joint-modelling + CRFs in java☆14Jul 14, 2018Updated 7 years ago
- Tools for working with the CMU Pronunciation Dictionary☆36Sep 5, 2017Updated 8 years ago
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆15Jan 24, 2017Updated 9 years ago
- wake word spotting with kaldi☆19Dec 3, 2020Updated 5 years ago
- A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.☆15May 19, 2020Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Deploy Kaldi models using grpc for bidirectional streaming.☆17Sep 30, 2024Updated last year
- A Python 3 compiler that anyone can understand.☆68Jul 9, 2014Updated 11 years ago
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Feb 15, 2024Updated 2 years ago
- An app that graphs and compares the pitch contours of spoken language, to help language learners perfect their intonation (Hackbright Spr…☆31Jul 20, 2017Updated 8 years ago
- Calculate remaining reading time estimates in real-time☆24Sep 4, 2014Updated 11 years ago
- Recurrent Neural Network language modeling toolkit☆38Jan 23, 2014Updated 12 years ago
- An Online Logic Assistant Based on Coq☆25Feb 15, 2012Updated 14 years ago
- BurrMill core☆22Nov 2, 2021Updated 4 years ago
- 📖 LanMIT: A Toolkit for Improving Language Models in Low-resourced Speech Recognition based on Kaldi.☆22Jul 12, 2019Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆22Jul 8, 2021Updated 4 years ago
- A free & open tool for transcribing audio interviews with offline ASR support☆25Dec 21, 2023Updated 2 years ago
- A handy dataset of noises for ASR☆22May 29, 2019Updated 7 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Apr 10, 2014Updated 12 years ago
- small python app to help practice speech shadowing, helpful for language learning☆16Jun 25, 2020Updated 5 years ago
- Phonetic and phonological vocoding platform☆17Nov 23, 2016Updated 9 years ago
- ☆25Jun 14, 2022Updated 4 years ago
- lazy generators with observation☆14Nov 2, 2023Updated 2 years ago
- Audio Diarization Annotation tool☆30Nov 8, 2019Updated 6 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆11Aug 6, 2015Updated 10 years ago
- Deep Boltzmann Machines in R^N dimensions☆11May 14, 2014Updated 12 years ago
- Phone-level evaluation of L2 speakers (GOP algorithm)☆27Mar 1, 2017Updated 9 years ago
- Some basic tools for interacting with `tcf-agent`☆11Jan 19, 2024Updated 2 years ago
- Guess sentences from initial letters of each word☆24Aug 22, 2022Updated 3 years ago
- Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.☆13Feb 13, 2021Updated 5 years ago
- bilingual dictionary extractor from parallel corpora☆24Jul 3, 2014Updated 11 years ago