The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.
☆151Oct 28, 2025Updated 5 months ago
Alternatives and similar repositories for VITA
Users that are interested in VITA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆38Jul 9, 2024Updated last year
- This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"☆26Aug 24, 2023Updated 2 years ago
- ☆32Jul 29, 2024Updated last year
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Sep 3, 2025Updated 7 months ago
- ✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy☆305May 14, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆15Feb 1, 2026Updated 2 months ago
- [NeurIPS 2024 Oral] RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation☆19Dec 22, 2024Updated last year
- MelGAN and Tacotron 2 in PyTorch☆11Oct 22, 2019Updated 6 years ago
- Onset-and-Offset-Aware Sound Event Detection☆22Feb 10, 2025Updated last year
- [AAAI 2024] The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Refer…☆45Dec 20, 2023Updated 2 years ago
- ☆19Jan 7, 2026Updated 3 months ago
- The official implement of "Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings"☆18Dec 5, 2024Updated last year
- [MM2024 Oral] 3D-GRES: Generalized 3D Referring Expression Segmentation☆42Dec 15, 2024Updated last year
- The official implement of Freeze-Omni.☆15Jul 10, 2025Updated 9 months ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model☆679May 24, 2025Updated 10 months ago
- The official repository TimeAudio, a comprehensive framework that incorporates fine-grained acoustic cues into LALMs with enhanced module…☆26Nov 18, 2025Updated 4 months ago
- [AAAI 2026 & ACL 2026] The official implementation of the DIFFA series for dLLM-based large audio language model☆76Apr 7, 2026Updated last week
- LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))☆58Jun 9, 2025Updated 10 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆69Jun 9, 2024Updated last year
- ☆13Oct 3, 2024Updated last year
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆47Sep 19, 2025Updated 6 months ago
- ☆18Mar 4, 2024Updated 2 years ago
- ✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Vi…☆78Apr 28, 2025Updated 11 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [CVPR2025] FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression☆62Oct 10, 2025Updated 6 months ago
- A Guide for Modding a RTX 3070 to 16 GB VRAM☆54Nov 30, 2025Updated 4 months ago
- ☆27Jan 30, 2024Updated 2 years ago
- ☆11Jun 21, 2025Updated 9 months ago
- official code for unigame☆19Nov 26, 2025Updated 4 months ago
- Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)☆28Jun 6, 2025Updated 10 months ago
- This repository contains Reinforcement Learning (RL) environments for the Upkie robot.☆28Mar 11, 2026Updated last month
- ☆90May 10, 2024Updated last year
- 🚀 轻量视频🎥 大模型🤖☆22Apr 27, 2025Updated 11 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆14Apr 25, 2025Updated 11 months ago
- ☆13May 17, 2025Updated 10 months ago
- Self-supervised algorithm for learning representations from ego-centric video data. Code is tested on EPIC-Kitchens-100 and Ego4D in PyTo…☆13Oct 23, 2022Updated 3 years ago
- [ICCV 2025] The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs".☆22Oct 28, 2025Updated 5 months ago
- Differentiable non-uniform interpolation: https://arxiv.org/abs/2012.13257☆11Oct 3, 2021Updated 4 years ago
- ☆35Mar 31, 2026Updated 2 weeks ago
- [ICCV 2025] VLM4D: Towards Spatiotemporal Awareness in Vision Language Models☆45Nov 20, 2025Updated 4 months ago