BUTSpeechFIT / DiariZen
A toolkit for speaker diarization.
☆174Updated last week
Alternatives and similar repositories for DiariZen:
Users that are interested in DiariZen are comparing it to the libraries listed below
- We Speech Transcript based on LLM, in 300 lines of code.☆156Updated last month
- A lightweight end-to-end text-to-speech model☆111Updated last month
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆141Updated this week
- ☆159Updated 4 months ago
- Open source inference code for Rev's model☆389Updated 3 weeks ago
- OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.☆346Updated 2 weeks ago
- Speech Diarization for scrum automation☆102Updated last year
- GPT-4o-level, real-time spoken dialogue system.☆302Updated 2 months ago
- A enterprise-grade Voice Activity Detector from modelscope and funasr.☆88Updated last year
- This is the audio sample repository for speech separation model "MossFormer2".☆120Updated 4 months ago
- Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.☆163Updated 4 months ago
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆166Updated last month
- An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement☆148Updated 3 weeks ago
- Target Speaker Extraction Toolkit☆155Updated 3 weeks ago
- SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime☆86Updated 6 months ago
- Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice☆277Updated 2 months ago
- TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…☆93Updated 3 months ago
- LlamaVoice is a llama-based large voice generation model, providing inference and training ability.☆232Updated 7 months ago
- FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music gener…☆392Updated last year
- ☆193Updated 6 months ago
- RealSI: Open Benchmark for Simultaneous Interpretation in Real-world Scenarios☆53Updated 4 months ago
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆159Updated last week
- Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.☆94Updated 2 weeks ago
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆238Updated 3 weeks ago
- An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System☆406Updated last week
- Python Wrapper of Silero VAD☆49Updated 3 months ago
- MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction mode…☆199Updated 2 months ago
- CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages☆129Updated last month
- The official Pytorch implementation of "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based …☆126Updated last month
- F5-TTS 推理加速,速度提升约4倍!☆64Updated 2 months ago