turnerdan/joethecorpusrogan

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/turnerdan/joethecorpusrogan)

turnerdan / joethecorpusrogan

A corpus of speech from the Joe Rogan Experience podcast, consisting of 8.43 million words. It includes aligned TextGrids with phonetic and word-level annotations.

☆21

Alternatives and similar repositories for joethecorpusrogan

Users that are interested in joethecorpusrogan are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

coryshain / dnnseg
View on GitHub
☆11Mar 20, 2021Updated 5 years ago
trecpodcasts / podcast-audio-feature-extraction
View on GitHub
Audio feature extraction and baseline search implementation for the Spotify Podcast Dataset.
☆12Sep 30, 2021Updated 4 years ago
ccoreilly / deepspeech-catala
View on GitHub
Deepspeech ASR Model for the Catalan Language
☆17Feb 15, 2021Updated 5 years ago
alumae / streaming-punctuator
View on GitHub
☆17Apr 14, 2023Updated 3 years ago
qiujiali / lattice-rescore
View on GitHub
☆16Jun 13, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
burrmill / burrmill
View on GitHub
BurrMill core
☆22Nov 2, 2021Updated 4 years ago
noajshu / scotus-speech
View on GitHub
Corpus of oral arguments (recorded speech + official transcripts) of the United States Supreme Court
☆22Dec 8, 2022Updated 3 years ago
kate-egorova / ASR-hybrid-decoding
View on GitHub
This is an extension of kaldi speech recognition software which allows to perform decoding of speech with hybrid word and phoneme graphs.…
☆11Feb 4, 2020Updated 6 years ago
UKDataServiceOpen / agent-based-modelling
View on GitHub
Materials associated with the Agent-based Modelling training series
☆11Mar 18, 2022Updated 4 years ago
NLS-Digital-Scholarship / collections-as-data
View on GitHub
Jupyter Notebooks for reuse in analysis of National Library of Scotland's collections as data
☆12Oct 3, 2022Updated 3 years ago
daanzu / kaldi_ag_training
View on GitHub
Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-gramma…
☆21Jan 24, 2022Updated 4 years ago
igormq / ctcdecode-pytorch
View on GitHub
Python implementation of CTC beam search decoder + agnostic LM scorer
☆20Dec 16, 2020Updated 5 years ago
arysin / nlp_uk_api
View on GitHub
☆11Oct 19, 2024Updated last year
gpu-poor / gramvaani_hindi_asr
View on GitHub
This repo contains the baseline model recipes and pre-trained model for GramVanni hindi ASR challenge
☆16Mar 26, 2022Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
UKDataServiceOpen / being-a-computational-social-scientist
View on GitHub
Materials associated with the Becoming a Computational Social Scientist training series
☆14May 22, 2023Updated 3 years ago
MiniXC / phones
View on GitHub
A collection of utilities for handling IPA phones.
☆27Sep 24, 2023Updated 2 years ago
m-wiesner / nnet_pytorch
View on GitHub
Kaldi style neural network training in pytorch for use in place of nnet3 in Kaldi.
☆26Jul 25, 2024Updated 2 years ago
desh2608 / kaldi-noise-vectors
View on GitHub
Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.
☆13Feb 13, 2021Updated 5 years ago
PyThaiNLP / thai-g2p-wiktionary-corpus
View on GitHub
Thai Grapheme to Phoneme (G2P) Wiktionary Corpus
☆13Jul 25, 2022Updated 4 years ago
tiro-is / tiro-speech-core
View on GitHub
This is a mirror of https://gitlab.com/tiro-is/tiro-speech-core
☆15Jun 19, 2023Updated 3 years ago
psoulos / role-decomposition
View on GitHub
☆11Feb 11, 2020Updated 6 years ago
motazsaad / ara-pronunciation-tool
View on GitHub
A python tool that converts Arabic diacritised text to a sequence of phonemes and creates a pronunciation dictionary. This code is based …
☆15Sep 5, 2017Updated 8 years ago
aispeech-lab / TinyWASE
View on GitHub
PyTorch implementation of TinyWASE described in our paper "Compressing Speaker Extraction Model with Ultra-low Precision Quantization and…
☆11Jun 28, 2021Updated 5 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
keplerlab / enhant
View on GitHub
enhan(t) is an open source toolkit which enables you to enhance the web experience of existing video conferencing solutions like Zoom, MS…
☆15Apr 28, 2022Updated 4 years ago
matthewmorrone / cmudict-ipa
View on GitHub
CMU dictionary in IPA instead of their subset of Arpabet
☆16Jun 21, 2026Updated last month
dan-wells / kiss-aligner
View on GitHub
Simple Kaldi recipe for forced alignment
☆11Jul 16, 2023Updated 3 years ago
Freguglia / ggfocus
View on GitHub
A 'ggplot2' extension that provides tools for automatically creating scales to focus on subgroups of the data plotted without removin…
☆24Sep 14, 2025Updated 10 months ago
daanzu / wenet_stt_python
View on GitHub
☆33Nov 27, 2021Updated 4 years ago
pigzach / MagicSpeechASR
View on GitHub
magicspeech competition recipe
☆18Jun 29, 2020Updated 6 years ago
xinjli / phonepiece
View on GitHub
phone inventory library
☆17May 15, 2023Updated 3 years ago
NickRuiz / power-asr
View on GitHub
Phonetically-Oriented Word Error Rate
☆36May 4, 2019Updated 7 years ago
Prem-kumar27 / Fast-KTSpeechCrawler
View on GitHub
Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler
☆23Mar 21, 2021Updated 5 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
google-research / last
View on GitHub
A JAX library for building lattice-based speech transducer models
☆48Jul 2, 2026Updated 3 weeks ago
rsprouse / xray_microbeam_database
View on GitHub
Annotations and scripts for use with University of Wisconsin X-Ray Microbeam Speech Production Database (1994)
☆14Oct 8, 2020Updated 5 years ago
k9luo / Punctuation-Restoration
View on GitHub
A TensorFlow Implementation of Punctuation Restoration.
☆18Nov 9, 2020Updated 5 years ago
athena-team / DiDiSpeech
View on GitHub
☆45Oct 24, 2020Updated 5 years ago
amazon-science / proteno
View on GitHub
This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to…
☆45May 25, 2021Updated 5 years ago
jim-schwoebel / voiceome
View on GitHub
🏥 🎤 The largest clinical study in the world to collect voice data labeled with health information (N>6,000 participants, 48 utterances…
☆32Apr 2, 2025Updated last year
NTRLab / MediaSpeech
View on GitHub
☆22Jul 22, 2022Updated 4 years ago