VITA-MLLM / VITALinks
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
☆2,364Updated 4 months ago
Alternatives and similar repositories for VITA
Users that are interested in VITA are comparing it to the libraries listed below
Sorting:
- ☆919Updated 4 months ago
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,818Updated 3 months ago
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,419Updated last month
- GPT4V-level open-source multi-modal model based on Llama3-8B☆2,392Updated 4 months ago
- Next-Token Prediction is All You Need