voiceboxneurips/voicebox

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/voiceboxneurips/voicebox)

voiceboxneurips / voicebox

☆25

Alternatives and similar repositories for voicebox

Users that are interested in voicebox are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

YangzlTHU / VStego800K
View on GitHub
☆11Mar 28, 2021Updated 5 years ago
RanaCM / DSU-AVO
View on GitHub
Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023
☆12May 13, 2024Updated 2 years ago
nlml / deconstruct-mediapipe
View on GitHub
☆15Mar 20, 2024Updated 2 years ago
ErikEkstedt / conv_ssl
View on GitHub
☆14Feb 9, 2023Updated 3 years ago
NTIA / alignnet
View on GitHub
Train no-reference speech quality estimators with multiple datasets via learned, per-dataset alignments.
☆18Aug 1, 2025Updated 11 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
idiap / zff_vad
View on GitHub
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering
☆23Oct 19, 2023Updated 2 years ago
RaphaelOlivier / whisper_attack
View on GitHub
☆23Apr 3, 2025Updated last year
ljuvela / SourceFilterNeuralFormants
View on GitHub
☆21Sep 20, 2024Updated last year
jdonley / Speech-Dereverberation-and-RIR-Estimation
View on GitHub
☆15Apr 18, 2023Updated 3 years ago
anton-jeran / Speech2RIR
View on GitHub
This is the official implementation of reverberant speech to room impulse response estimator
☆42Aug 7, 2024Updated last year
ccc013 / CodesNotes
View on GitHub
Codes when studying from books or tutorials
☆12Sep 22, 2020Updated 5 years ago
mileskuo42 / AudioMarkBench
View on GitHub
Dataset/code for AudioMarkBench: Benchmarking Robustness of Audio Watermarking
☆48Aug 23, 2024Updated last year
shivammehta25 / BetterFastSpeech2
View on GitHub
Just another FastSpeech 2 but cleaner code :)
☆29Jun 28, 2024Updated 2 years ago
tuanct1997 / Federated-Learning-ASR-based-on-wav2vec-2.0
View on GitHub
☆18Mar 13, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
stefantaubert / mel-cepstral-distance
View on GitHub
A Python library for computing the Mel-Cepstral Distance (Mel-Cepstral Distortion, MCD) between two inputs. This implementation is based …
☆67Aug 24, 2025Updated 11 months ago
rainavyas / prepend_acoustic_attack
View on GitHub
Prepend universal audio attack segment to mute Whisper
☆41Jan 22, 2025Updated last year
interactiveaudiolab / CAQE
View on GitHub
Crowdsourced Audio Quality Evaluation Toolkit
☆55Dec 7, 2022Updated 3 years ago
ms-dot-k / TMT
View on GitHub
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
☆18May 23, 2024Updated 2 years ago
fengnian123 / qwen-2.5-omni-realtime-chat
View on GitHub
使用fastrtc框架调用qwen-2.5-omni-realtime实现实时语音、视频等
☆14Jun 27, 2025Updated last year
RichardoMrMu / yolov5-reflective-clothes-detect-python
View on GitHub
A Python training and inference implementation of Yolov5 reflective clothes and helmet detection
☆20Dec 2, 2021Updated 4 years ago
Lhx94As / E2E-language-diarization
View on GitHub
Source code of paper <End-to-End Language Diarization for Bilingual Code-switching Speech>
☆19Jan 23, 2022Updated 4 years ago
SDchao / Time2Rest
View on GitHub
Protect your eyes to see the world!
☆11Oct 16, 2021Updated 4 years ago
lifeiteng / NaturalSpeech2
View on GitHub
☆33Jun 29, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
cocosci / pam-nac
View on GitHub
Psychoacoustic Calibration for Efficient Neural Audio Coding
☆26Sep 26, 2023Updated 2 years ago
zjlww / zjlww.github.io
View on GitHub
☆12Feb 26, 2023Updated 3 years ago
adityaShar24 / Social-Media-Backend
View on GitHub
Social-Media is an open-source social networking platform built with Python and Flask. It provides a simple and customizable foundation f…
☆14Nov 27, 2023Updated 2 years ago
Audio-WestlakeU / VINP
View on GitHub
Official PyTorch implementation of 'VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverb…
☆36Feb 23, 2026Updated 5 months ago
PecholaL / MAIN-VC
View on GitHub
Lightweight Speech Representation Learning for One-Shot Voice Conversion
☆23Dec 12, 2024Updated last year
zaocan666 / DyViSE
View on GitHub
Dynamic vision-guided speaker embedding for audio-visual speaker diarization
☆12Jul 5, 2022Updated 4 years ago
ShiningLab / POS-Tagger-for-Punctuation-Restoration
View on GitHub
This repository is for the paper Incorporating External POS Tagger for Punctuation Restoration. Proc. Interspeech 2021, 1987-1991, doi: 1…
☆11May 24, 2026Updated 2 months ago
jasonppy / syllable-discovery
View on GitHub
Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model
☆35Aug 27, 2023Updated 2 years ago
YangAi520 / NSPP
View on GitHub
☆55Mar 2, 2023Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
light1726 / BetaVAE_VC
View on GitHub
Implementation for paper "Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE"
☆43Apr 10, 2023Updated 3 years ago
MaitySubhajit / KArAt
View on GitHub
Kolmogorov-Arnold Attention: Is Learnable Attention Better for Vision Transformers?
☆16Jul 9, 2025Updated last year
ShovalMessica / NAST
View on GitHub
Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11…
☆46Jul 2, 2024Updated 2 years ago
cyhuang-tw / attack-vc
View on GitHub
The official implementation of the paper "Defending Your Voice: Adversarial Attack on Voice Conversion".
☆53May 15, 2024Updated 2 years ago
Sreyan88 / LipGER
View on GitHub
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
☆19Jul 16, 2024Updated 2 years ago
shrezaei / Target-Agnostic-Attack
View on GitHub
Target Agnostic Attack on Deep Models: Exploiting Security Vulnerabilities of Transfer Learning
☆10Jul 2, 2019Updated 7 years ago
gteu / realtime-ppg-vc
View on GitHub
Voice conversion model for real-time speech synthesis using PPG (Phonetic PosteriorGram) as an intermediate feature, written in Pytorch.
☆29Mar 3, 2022Updated 4 years ago