pengzhendong/wavesurfer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/pengzhendong/wavesurfer)

pengzhendong / wavesurfer

For audio visualization and playback in Jupyter notebooks.

☆18

Alternatives and similar repositories for wavesurfer

Users that are interested in wavesurfer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Mddct / transformer-vocos
View on GitHub
☆35Sep 6, 2025Updated 10 months ago
pengzhendong / streaming-ChatTTS
View on GitHub
☆23Oct 30, 2024Updated last year
pengzhendong / audiolab
View on GitHub
A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)
☆39Mar 31, 2026Updated 3 months ago
pengzhendong / ngram-punctuator
View on GitHub
An N-gram punctuator for Chinese and English.
☆18Oct 14, 2025Updated 9 months ago
3loi / NaturalVoices
View on GitHub
☆61Oct 22, 2025Updated 9 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
P2Oileen / CitationHelper
View on GitHub
Google Scholar自搜小脚本，每次开启命令行即显示当前citation。Small Script displaying current citation count each time the shell is opened.
☆21Mar 3, 2025Updated last year
pengzhendong / torchfa
View on GitHub
Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.
☆61Sep 5, 2025Updated 10 months ago
pengzhendong / streaming-tts-webui
View on GitHub
Streaming Text to Speech Web UI
☆22May 6, 2024Updated 2 years ago
iver56 / loudness
View on GitHub
The world's fastest Python package for calculating integrated loudness (LUFS) from audio data as NumPy arrays
☆31Dec 26, 2025Updated 6 months ago
Alittleegg / Eureka-Audio
View on GitHub
Eureka-Audio: A 1.7B lightweight audio–language model that matches 7B–30B models on ASR, audio understanding, and paralinguistic reasonin…
☆40Apr 11, 2026Updated 3 months ago
AI-Hypercomputer / ray-tpu
View on GitHub
☆15May 11, 2025Updated last year
inclusionAI / MingTok-Audio
View on GitHub
☆88Feb 24, 2026Updated 5 months ago
SNTube / Streaming-Captions
View on GitHub
基于Streaming-SenseVoice项目的伪流式实时字幕界面
☆13Apr 15, 2025Updated last year
wx9Songs / MOSS-Music-Data-Pipeline
View on GitHub
☆44Apr 26, 2026Updated 2 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
scottishfold0621 / ACMID
View on GitHub
☆26Apr 30, 2026Updated 2 months ago
baichuan-inc / Baichuan-Audio
View on GitHub
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
☆223Feb 28, 2025Updated last year
kwatcharasupat / source-separation-landing
View on GitHub
Landing Page for All Things Source Separation
☆38Sep 12, 2025Updated 10 months ago
makerjackie / tts-frontend-dataset
View on GitHub
TTS FrontEnd DataSet: Polyphone / Prosody / TextNormalization
☆104Feb 5, 2024Updated 2 years ago
luotianze666 / WaveFM
View on GitHub
[NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
☆133Apr 8, 2026Updated 3 months ago
JishengBai / AudioSetCaps
View on GitHub
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
☆208Dec 13, 2024Updated last year
BayLing-Models / BayLing-Duplex
View on GitHub
Native full-duplex speech dialogue inference for BayLing-Duplex.
☆63Jun 22, 2026Updated last month
Sreyan88 / ReCLAP
View on GitHub
☆33Dec 23, 2025Updated 7 months ago
pengzhendong / welm
View on GitHub
One command to build TLG.fst for WeNet.
☆30Oct 11, 2022Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
XiaomiMiMo / MiMo-Audio-Eval
View on GitHub
☆88Jun 17, 2026Updated last month
EuiYeonKim / BSMamba2
View on GitHub
Official Implementation of BSMamba2
☆16Nov 28, 2025Updated 7 months ago
crlandsc / torch-log-wmse
View on GitHub
logWMSE, an audio quality metric & loss function with support for digital silence target. Useful for training and evaluating audio source…
☆48Apr 29, 2026Updated 2 months ago
Rongjiehuang / awesome-speech-to-speech-translation
View on GitHub
List of direct speech-to-speech translation papers.
☆39Jan 31, 2023Updated 3 years ago
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 5 months ago
ictnlp / SLED-TTS
View on GitHub
Streamable Text-to-Speech model using a language modeling approach, without vector quantization
☆108May 20, 2025Updated last year
ArenAcikgoz / Whisper-Alignment
View on GitHub
Forced alignment decoder for Whisper.
☆16Mar 13, 2024Updated 2 years ago
ASLP-lab / Easy-Turn
View on GitHub
Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems
☆122Jan 25, 2026Updated 6 months ago
NVIDIA / audio-intelligence
View on GitHub
Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with syntheti…
☆137Mar 3, 2026Updated 4 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
xingchensong / TouchNet
View on GitHub
A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.
☆233Jul 2, 2026Updated 3 weeks ago
XiaomiMiMo / MiMo-Audio-Tokenizer
View on GitHub
A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.
☆145Sep 19, 2025Updated 10 months ago
ozspeech / OZSpeech
View on GitHub
[ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
☆45Feb 9, 2025Updated last year
facebookresearch / lst
View on GitHub
Code for Latent Speech-Text Transformer (LST)
☆35Mar 12, 2026Updated 4 months ago
wonjune-kang / expressive-speech-retrieval
View on GitHub
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
☆15Aug 18, 2025Updated 11 months ago
xiaomi-research / dasheng-tokenizer
View on GitHub
State-of-the-art continious audio tokenization
☆40Mar 9, 2026Updated 4 months ago
ws-choi / sdx23
View on GitHub
KUIELAB-MDX-Net got the 2nd place on the Leaderboard A and the 3rd place on the Leaderboard B in the MDX-Challenge ISMIR 2021
☆16Mar 13, 2023Updated 3 years ago