VITA-MLLM / VITA-AudioLinks
✨✨VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
☆614Updated last month
Alternatives and similar repositories for VITA-Audio
Users that are interested in VITA-Audio are comparing it to the libraries listed below
Sorting:
- PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reas…☆515Updated this week
- Ola: Pushing the Frontiers of Omni-Modal Language Model☆347Updated last month
- Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。☆1,769Updated 5 months ago
- The first Large Audio Language Model that enables native in-depth thinking, which is trained on large-scale audio Chain-of-Thought data.☆233Updated last month
- [ICML 2025] PyTorch Implementation of "OmniAudio: Generating Spatial Audio from 360-Degree Video"☆311Updated 2 weeks ago
- Real Time High-Fidelity Faceswap☆823Updated last month
- Turn detection for full-duplex dialogue communication☆313Updated last week
- A song aesthetic evaluation toolkit trained on SongEval.