backspacetg/simul_whisper

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/backspacetg/simul_whisper)

backspacetg / simul_whisper

Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection

☆112

Alternatives and similar repositories for simul_whisper

Users that are interested in simul_whisper are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ufal / SimulStreaming
View on GitHub
☆643Jul 12, 2026Updated 2 weeks ago
backspacetg / distilXLSR
View on GitHub
Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model
☆13Mar 30, 2025Updated last year
tomer9080 / WhisperRT-Streaming
View on GitHub
Causal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.
☆75Mar 31, 2026Updated 3 months ago
pengzhendong / asr-decoder
View on GitHub
CTC decoder with hotwords for ASR.
☆38Jun 15, 2026Updated last month
MooreThreads / MooER
View on GitHub
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction mode…
☆219Jan 8, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
fyvo / WMT-Biomed-Test
View on GitHub
☆13Aug 23, 2024Updated last year
pengzhendong / speaker-diarization
View on GitHub
Offline Speaker Diarization with SenseVoice by Sherpa ONNX.
☆15Dec 23, 2024Updated last year
huutuongtu / Lightvoc
View on GitHub
LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM
☆18May 17, 2024Updated 2 years ago
hhguo / SoCodec
View on GitHub
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
☆92Dec 20, 2024Updated last year
xinshengwang / robpitch
View on GitHub
A pitch detection model trained to be robust against noise and reverberation environments.
☆27Jan 21, 2025Updated last year
pengzhendong / streaming-vocos
View on GitHub
Streaming Vocos
☆31Jun 10, 2025Updated last year
ufal / whisper_streaming
View on GitHub
Whisper realtime streaming for long speech-to-text transcription and translation
☆3,657Nov 12, 2025Updated 8 months ago
qinxiaoyi / TimeVarying_ASV
View on GitHub
☆12Oct 17, 2024Updated last year
LAION-AI / emotional-speech-annotations
View on GitHub
This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models
☆35Oct 13, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
danliu2 / caat
View on GitHub
☆35Sep 1, 2022Updated 3 years ago
shang0712 / HierTTS
View on GitHub
☆47Apr 16, 2023Updated 3 years ago
youngsheen / GPST
View on GitHub
[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer
☆70Nov 1, 2024Updated last year
ryota-komatsu / speaker_disentangled_hubert
View on GitHub
Official repository of the IEEE OJSP paper "Speaker-Disentangled Chunk-Wise Regression for Syllabic Tokenization"
☆46Updated this week
SpeechColab / GigaSpeech2
View on GitHub
An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
☆198Apr 28, 2026Updated 3 months ago
wenet-e2e / wesr
View on GitHub
We Speech Transcript based on LLM, in 300 lines of code.
☆182Jun 20, 2025Updated last year
lovemefan / paraformer-python
View on GitHub
paraformer(chinense asr) online onnx runtime for python
☆54Mar 27, 2024Updated 2 years ago
mmmmayi / ExPO
View on GitHub
official implementation of paper ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification
☆15Mar 14, 2025Updated last year
X-LANCE / SLAM-LLM
View on GitHub
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
☆1,050Jan 15, 2026Updated 6 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Plachtaa / FAcodec
View on GitHub
Training code for FAcodec presented in NaturalSpeech3
☆244Aug 26, 2024Updated last year
ryota-komatsu / speech_resynth
View on GitHub
Speech Resynthesis and Language Modeling
☆27Jun 11, 2025Updated last year
Srijith-rkr / Whispering-LLaMA
View on GitHub
EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
☆271May 19, 2024Updated 2 years ago
fss1t / CausalStarGANv2-VC
View on GitHub
☆22Apr 4, 2023Updated 3 years ago
redmist328 / APNet2
View on GitHub
Source code of APNet2, a vocoder
☆60Nov 23, 2023Updated 2 years ago
hs-oh-prml / DiffProsody
View on GitHub
☆69Jul 29, 2023Updated 3 years ago
innnky / FreeSVC
View on GitHub
基于FreeVC的歌声转换
☆21Dec 16, 2022Updated 3 years ago
wetdog / wavenext_pytorch
View on GitHub
Unofficial implementation of wavenext vocoder
☆59Aug 28, 2024Updated last year
yl4579 / StyleTTS-VC
View on GitHub
Official Implementation of StyleTTS-VC
☆200Jan 14, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Mddct / usm-tokenizer
View on GitHub
semantic tokenizer for speech and music
☆20Jul 6, 2025Updated last year
joonaskalda / PixIT
View on GitHub
Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings…
☆105Jan 10, 2025Updated last year
pyf98 / DPHuBERT
View on GitHub
INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models"
☆118Jan 26, 2024Updated 2 years ago
archinetai / aligner-pytorch
View on GitHub
Sequence alignement methods with helpers for PyTorch.
☆24Nov 30, 2022Updated 3 years ago
atosystem / SSL_Interface
View on GitHub
Interface Design for Self-Supervised Speech Models, Accepted to Interspeech2024
☆16Nov 19, 2024Updated last year
Respaired / RiFornet_Vocoder
View on GitHub
a Neural Vocoder supporting Ring Attention, Conformer and NSF.
☆25Aug 1, 2025Updated 11 months ago
pengzhendong / audiolab
View on GitHub
A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)
☆39Mar 31, 2026Updated 3 months ago