FireRedTeam/FireRedVAD

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FireRedTeam/FireRedVAD)

FireRedTeam / FireRedVAD

A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD

☆472

Alternatives and similar repositories for FireRedVAD

Users that are interested in FireRedVAD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

FireRedTeam / FireRedASR2S
View on GitHub
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/…
☆619Jun 2, 2026Updated last month
ASLP-lab / Easy-Turn
View on GitHub
Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems
☆122Jan 25, 2026Updated 6 months ago
xiquan-li / MeanAudio
View on GitHub
[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
☆145Sep 2, 2025Updated 10 months ago
Soul-AILab / SoulX-Transcriber
View on GitHub
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
☆284Jun 22, 2026Updated last month
xiaomi-research / dasheng-tokenizer
View on GitHub
State-of-the-art continious audio tokenization
☆40Mar 9, 2026Updated 4 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
k2-fsa / ZipVoice
View on GitHub
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
☆1,021Dec 2, 2025Updated 7 months ago
TEN-framework / ten-vad
View on GitHub
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
☆2,210Feb 2, 2026Updated 5 months ago
Soul-AILab / SoulX-Duplug
View on GitHub
Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.
☆278Jul 17, 2026Updated last week
ASLP-lab / Speaker-Reasoner
View on GitHub
Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR
☆93May 13, 2026Updated 2 months ago
xiaomi-research / dasheng-audiogen
View on GitHub
end-to-end text to audio scene generation model
☆50Jun 16, 2026Updated last month
ASLP-lab / FlashTTS
View on GitHub
Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation
☆67Jun 16, 2026Updated last month
wenet-e2e / west
View on GitHub
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
☆206Jul 17, 2026Updated last week
pengzhendong / audiolab
View on GitHub
A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)
☆39Mar 31, 2026Updated 3 months ago
ZhikangNiu / Semantic-VAE
View on GitHub
[INTERSPEECH 2026 Oral]Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
☆121Jun 21, 2026Updated last month
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ASLP-lab / OSUM-Pangu
View on GitHub
An Open-Source Multidimension Speech Understanding Foundation Model Built upon OpenPangu on Ascend NPUs
☆33Mar 15, 2026Updated 4 months ago
xingchensong / FlashCosyVoice
View on GitHub
FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.
☆250Feb 25, 2026Updated 5 months ago
facebookresearch / audiobox-aesthetics
View on GitHub
Unified automatic quality assessment for speech, music, and sound.
☆747Jun 5, 2025Updated last year
inclusionAI / Ming-omni-tts
View on GitHub
Ming-omni-tts: Simple and Efficient Unified Generation of Speech, Music, and Sound with Precise Control
☆264Feb 26, 2026Updated 5 months ago
inclusionAI / Ming-UniAudio
View on GitHub
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
☆451Nov 27, 2025Updated 8 months ago
yfyeung / CLSP
View on GitHub
[ACL 2026 Main] Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
☆104Apr 6, 2026Updated 3 months ago
yanghaha0908 / WavCube
View on GitHub
Official code for "WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling"
☆62Jun 27, 2026Updated last month
k2-fsa / Flow2GAN
View on GitHub
Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation
☆146Mar 8, 2026Updated 4 months ago
ASLP-lab / VoiceSculptor
View on GitHub
An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.
☆250Feb 26, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Jerrister / X-VC
View on GitHub
X-VC: Zero-shot Streaming Voice Conversion in Codec Space
☆69May 6, 2026Updated 2 months ago
thu-spmi / CTC-TTS
View on GitHub
Code for CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment, Interspeech 2026.
☆20Jun 9, 2026Updated last month
xiquan-li / Resonate
View on GitHub
[INTERSPEECH 2026] Pre-training, SFT, DPO and GRPO for Text-to-Audio Generation
☆48Apr 17, 2026Updated 3 months ago
halsay / ASR-TTS-paper-daily
View on GitHub
Update ASR paper everyday
☆513May 16, 2026Updated 2 months ago
wenet-e2e / wesep
View on GitHub
Target Speaker Extraction Toolkit
☆300Oct 4, 2025Updated 9 months ago
ASLP-lab / SenSE
View on GitHub
Official code of SenSE.
☆90Oct 30, 2025Updated 8 months ago
xcc-zach / xtalk
View on GitHub
X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech…
☆233Updated this week
leospark / FireRedVAD-Engineering
View on GitHub
Lightweight streaming Voice Activity Detection (VAD) tool with ONNX runtime
☆24Mar 18, 2026Updated 4 months ago
xingchensong / S3Tokenizer
View on GitHub
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
☆521Dec 22, 2025Updated 7 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Gilgamesh-J / X-ASR
View on GitHub
X-ASR is a series of automatic speech recognition models based on the icefall framework, focusing on streaming ASR and low-latency deploy…
☆148Jul 8, 2026Updated 3 weeks ago
RayYuki / CodecBench
View on GitHub
☆24Nov 16, 2025Updated 8 months ago
ASLP-lab / MeanVC
View on GitHub
A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
☆298Jan 8, 2026Updated 6 months ago
xiaomi-research / tts-prism
View on GitHub
☆47Apr 27, 2026Updated 3 months ago
Soul-AILab / SAC
View on GitHub
[ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.
☆108Nov 1, 2025Updated 8 months ago
wsntxxn / UniFlow-Audio
View on GitHub
☆74Jul 17, 2026Updated last week
ASLP-lab / ArxivWatcher
View on GitHub
☆32Jun 15, 2026Updated last month