sarulab-speech/audio-foundation-model-dataset

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sarulab-speech/audio-foundation-model-dataset)

sarulab-speech / audio-foundation-model-dataset

☆65

Alternatives and similar repositories for audio-foundation-model-dataset

Users that are interested in audio-foundation-model-dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kyamauchi1023 / PL-BERT-ja
View on GitHub
A repository of Japanese Phoneme-Level BERT
☆24Dec 16, 2023Updated 2 years ago
line / WaveTrainerFit
View on GitHub
Official implementation of "Wave-Trainer-Fit: Neural Vocoder with Trainable Prior and Fixed-Point Iteration towards High-Quality Speech G…
☆16Feb 6, 2026Updated 5 months ago
ryota-komatsu / slp2025
View on GitHub
Survey of audio language models
☆65Apr 18, 2026Updated 3 months ago
hirokisince1998 / jasj-bibtex
View on GitHub
日本音響学会誌用BibTeXスタイルファイル
☆11Jan 24, 2022Updated 4 years ago
hhguo / SoCodec
View on GitHub
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
☆92Dec 20, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
sarulab-speech / Coco-Nut
View on GitHub
Coco-Nut (Corpus of connecting NIHONGO utterance and text) corpus
☆21Jun 12, 2024Updated 2 years ago
wetdog / wavenext_pytorch
View on GitHub
Unofficial implementation of wavenext vocoder
☆59Aug 28, 2024Updated last year
unilight / sheet
View on GitHub
Speech Human Evaluation Estimation Toolkit (SHEET)
☆138Mar 31, 2026Updated 3 months ago
mcf330 / efts2code
View on GitHub
source code of EfficientTTS 2
☆21Feb 18, 2024Updated 2 years ago
unilight / jatts
View on GitHub
JATTS: A modern, research-oriented Japanese Text-to-speech Open-sourced Toolkit
☆43Mar 13, 2026Updated 4 months ago
LAION-AI / emotional-speech-annotations
View on GitHub
This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models
☆35Oct 13, 2024Updated last year
sarulab-speech / xvector_jtubespeech
View on GitHub
xvector model on jtubespeech
☆47Nov 5, 2023Updated 2 years ago
kotoba-tech / Open-GPT-4o
View on GitHub
☆10May 16, 2024Updated 2 years ago
ajaybati / miipher2.0
View on GitHub
Reimplementation of Miipher
☆30Aug 16, 2023Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
yukara-ikemiya / wavefit-pytorch
View on GitHub
PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.
☆70Jul 13, 2026Updated 2 weeks ago
reppy4620 / vocoders
View on GitHub
My vocoder experiments
☆31Jul 26, 2025Updated last year
yukara-ikemiya / Open-Miipher-2
View on GitHub
PyTorch implementation of Miipher-2 [2025] which is a speech restoration model by Google DeepMind
☆70Sep 22, 2025Updated 10 months ago
ArenAcikgoz / Whisper-Alignment
View on GitHub
Forced alignment decoder for Whisper.
☆16Mar 13, 2024Updated 2 years ago
sarulab-speech / UTMOS22
View on GitHub
UT-Sarulab MOS prediction system using SSL models
☆309Apr 11, 2024Updated 2 years ago
line / LibriTTS-P
View on GitHub
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
☆161Jun 13, 2024Updated 2 years ago
tonnetonne814 / SiFi-VITS2-44100-Ja
View on GitHub
DDPM-based Pitch Generation and Pitch Controllable Voice Synthesis.
☆55Sep 25, 2023Updated 2 years ago
KoheiYatabe / DGTtool
View on GitHub
A simple and user-friendly tool for computing STFT/DGT
☆19Jun 22, 2021Updated 5 years ago
MaAI-Kyoto / MaAI
View on GitHub
A real-time software for turn-taking, backchannel, and head-nodding prediction
☆107Jul 21, 2026Updated last week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
tsukumijima / pyopenjtalk-plus
View on GitHub
pyopenjtalk-plus: A Python wrapper for OpenJTalk with additional improvements
☆58Jul 18, 2026Updated last week
ajd12342 / paraspeechcaps
View on GitHub
Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'
☆165Mar 26, 2026Updated 4 months ago
HidekiKawahara / SafeguardingTestSounds
View on GitHub
Acoustic measurement using music pieces
☆12Aug 5, 2022Updated 3 years ago
kotoba-tech / kotoba-speech-release
View on GitHub
☆49Jul 22, 2024Updated 2 years ago
SakanaAI / kame_finetune
View on GitHub
☆30Jul 16, 2026Updated last week
0nutation / USLM
View on GitHub
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
☆152Sep 14, 2023Updated 2 years ago
6gsn / marine
View on GitHub
☆38Sep 20, 2022Updated 3 years ago
0nutation / SLMTokBench
View on GitHub
SLMTokBench for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"
☆37Aug 29, 2023Updated 2 years ago
mmorise / beginning_asa
View on GitHub
ひたすら楽して音響信号解析：プログラム置き場
☆28Jan 25, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
yoongi43 / VRVQ
View on GitHub
Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"
☆11Apr 10, 2025Updated last year
Mrunal-G / Casual-turn-taking-and-backchannel-prediction
View on GitHub
☆16Jun 25, 2024Updated 2 years ago
Parakeet-Inc / J-HARD-TTS-Eval
View on GitHub
☆21Jan 28, 2026Updated 6 months ago
wavlab-speech / versa
View on GitHub
Versatile Evaluation of Speech and Audio
☆425Jul 21, 2026Updated last week
rishikksh20 / multiband-hifigan
View on GitHub
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
☆45Mar 2, 2021Updated 5 years ago
slSeanWU / beats-conformer-bart-audio-captioner
View on GitHub
PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…
☆41Jan 6, 2024Updated 2 years ago
introlab / uimvdr
View on GitHub
☆13Oct 11, 2024Updated last year