nari-labs / dia2Links
TTS model capable of streaming conversational audio in realtime.
☆128Updated last week
Alternatives and similar repositories for dia2
Users that are interested in dia2 are comparing it to the libraries listed below
Sorting:
- VLLM Port of the Chatterbox TTS model☆337Updated last month
- An open-source implementation of Whisper☆459Updated last month
- Lightning-fast, on-device TTS — running natively via ONNX.☆1,188Updated this week
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆295Updated 5 months ago
- ☆264Updated 3 weeks ago
- Liquid Audio - Speech-to-Speech audio models by Liquid AI☆276Updated 2 months ago
- Self-host the ultra-lightweight Kitten TTS model with this enhanced API server with an intuitive Web UI, large text processing for audiob…☆213Updated 3 months ago
- ☆527Updated last month
- ☆313Updated 3 months ago
- ☆635Updated 2 weeks ago
- Optimized Whisper models for streaming and on-device use☆592Updated this week
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆215Updated 7 months ago
- Fast Streaming TTS with Orpheus + WebRTC (with FastRTC)☆344Updated 7 months ago
- Make text LLMs listen and speak☆994Updated last week
- Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…☆692Updated last month
- ComfyDeployed☆429Updated 2 months ago
- Whisper STT + Orpheus TTS + Gemma 3 using LM Studio to create a virtual assistant.☆72Updated 6 months ago
- Open-source framework for developing real-time multimodal conversational AI agents.☆531Updated last week
- Open Audio Watermarking Tool☆378Updated 5 months ago
- A lightweight recreation of OS1/Samantha from the movie Her, running locally in the browser☆111Updated 4 months ago
- Streaming and Fine-tuning for Chatterbox TTS☆220Updated 5 months ago
- Automated speech dataset creator☆209Updated 5 months ago
- ☆203Updated last month
- ☆399Updated 2 weeks ago
- Unofficial WIP LoRa Finetuning repository for VibeVoice☆262Updated 2 months ago
- A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics…☆706Updated this week
- Tencent Hunyuan A13B (short as Hunyuan-A13B), an innovative and open-source LLM built on a fine-grained MoE architecture.☆805Updated 4 months ago
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B☆506Updated last week
- OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.☆580Updated last month
- ☆300Updated 3 months ago