mpacula/AutoCorpus

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mpacula/AutoCorpus)

mpacula / AutoCorpus

AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.

☆37

Alternatives and similar repositories for AutoCorpus

Users that are interested in AutoCorpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Open-Speech-EkStep / crowdsource-dataplatform
View on GitHub
This will hold the crowdsourcing platform to be used to store voice data from various speakers which will act as input dataset for speech…
☆17Mar 6, 2023Updated 3 years ago
Minzard / Correctable-Pronunciation
View on GitHub
This is application for dysarthria to improve their pronunciation by using deep learning
☆10Dec 29, 2020Updated 5 years ago
homink / kaldi-asr.forced_decoding
View on GitHub
Perform the forced decoding with target transcription
☆11Sep 12, 2018Updated 7 years ago
talhanai / kaldi-diar-latte
View on GitHub
steps to perform text-based speaker diarization with kaldi toolkit
☆12Nov 2, 2018Updated 7 years ago
meyersbs / SPLAT
View on GitHub
Speech Processing & Linguistic Analysis Tool
☆11Jun 30, 2019Updated 7 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
gullabi / STT-align
View on GitHub
Coqui STT (🐸STT) based forced alignment tool
☆13Feb 24, 2022Updated 4 years ago
TalnUPF / praat_web
View on GitHub
☆13Jun 30, 2026Updated 3 weeks ago
Open-Speech-EkStep / data-acquisition-pipeline
View on GitHub
☆18Apr 28, 2021Updated 5 years ago
steveash / jg2p
View on GitHub
Grapheme to phoneme toolkit using joint-modelling + CRFs in java
☆15Jul 14, 2018Updated 8 years ago
CoEDL / kaldi_helpers
View on GitHub
A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.
☆15May 19, 2020Updated 6 years ago
declanoller / haskell-vae
View on GitHub
Learning about Haskell with Variational Autoencoders
☆19Nov 16, 2019Updated 6 years ago
isi-nlp / carmel
View on GitHub
finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests
☆15Jan 24, 2017Updated 9 years ago
JarbasAl / kaldi_spotter
View on GitHub
wake word spotting with kaldi
☆19Dec 3, 2020Updated 5 years ago
darius / amphigory
View on GitHub
Metrical rhyming verse in Javascript
☆16Jun 13, 2013Updated 13 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
danafallon / IntonationCoach
View on GitHub
An app that graphs and compares the pitch contours of spoken language, to help language learners perfect their intonation (Hackbright Spr…
☆32Jul 20, 2017Updated 9 years ago
egorsmkv / asr-corpus-creator
View on GitHub
This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.
☆27Feb 15, 2024Updated 2 years ago
katakombi / rnnlm
View on GitHub
Recurrent Neural Network language modeling toolkit
☆38Jan 23, 2014Updated 12 years ago
burrmill / burrmill
View on GitHub
BurrMill core
☆22Nov 2, 2021Updated 4 years ago
MontrealCorpusTools / speechcorpustools
View on GitHub
Easier analysis of large speech corpora
☆24Jun 22, 2021Updated 5 years ago
Luracast / Restler-Application
View on GitHub
Application boilerplates for Restler. Each branch contains a flavor, find the one that suits you.
☆11Jun 13, 2021Updated 5 years ago
tmetsch / dtrace-web-ide
View on GitHub
Python based & web based IDE for DTrace with Data Visualizations
☆15Jun 13, 2012Updated 14 years ago
charlesliucn / LanMIT
View on GitHub
📖 LanMIT: A Toolkit for Improving Language Models in Low-resourced Speech Recognition based on Kaldi.
☆22Jul 12, 2019Updated 7 years ago
projecte-aina / oTranscribe-plus
View on GitHub
A free & open tool for transcribing audio interviews with offline ASR support
☆25Dec 21, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
speechio / asr-noises
View on GitHub
A handy dataset of noises for ASR
☆22May 29, 2019Updated 7 years ago
KamalaSowmya / DiscussionSummarization
View on GitHub
Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…
☆12Apr 10, 2014Updated 12 years ago
idiap / phonvoc
View on GitHub
Phonetic and phonological vocoding platform
☆17Nov 23, 2016Updated 9 years ago
danijel3 / KaldiJava
View on GitHub
Java interfaces and tools for Kaldi speech recognition.
☆20Oct 2, 2016Updated 9 years ago
rsattar / RSNetflixEngine
View on GitHub
A useful library to communicate with the Netflix API
☆13Jan 25, 2012Updated 14 years ago
namin / 3-proto-lisp
View on GitHub
Code from the paper Reflection for the Masses by Charlotte Herzeel, Pascal Costanza, and Theo D'Hondt.
☆15Jun 21, 2021Updated 5 years ago
graydon / exhaustigen
View on GitHub
☆12Nov 16, 2021Updated 4 years ago
emoreno619 / sentimentAnalysis
View on GitHub
This Node.js app built with MongoDB allows users to compare both scores from Yelp and Google+ for a restaurant at the same time. It uses …
☆10Oct 18, 2015Updated 10 years ago
tscohen / HarmonicExponentialFamily
View on GitHub
Code for Harmonic Exponential Families on Manifolds
☆10Jun 2, 2016Updated 10 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ChaseBro / MMDAgent
View on GitHub
Implementation of the MMDAgent for use as a live receptionist in Carnegie Mellon's School of Computer Science.
☆16Apr 11, 2013Updated 13 years ago
smart-audio / audio_diarization_annotation
View on GitHub
Audio Diarization Annotation tool
☆30Nov 8, 2019Updated 6 years ago
desh2608 / kaldi-noise-vectors
View on GitHub
Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.
☆13Feb 13, 2021Updated 5 years ago
rctn / DeepBoltzmannRN
View on GitHub
Deep Boltzmann Machines in R^N dimensions
☆11May 14, 2014Updated 12 years ago
commonsense / luminoso
View on GitHub
A visualizer for multi-dimensional semantic data
☆38Oct 24, 2011Updated 14 years ago
Dthurow / SpeechShadowing
View on GitHub
small python app to help practice speech shadowing, helpful for language learning
☆16Jun 25, 2020Updated 6 years ago
shapr / trynocular
View on GitHub
lazy generators with observation
☆14Nov 2, 2023Updated 2 years ago