The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.
☆147Oct 28, 2025Updated 4 months ago
Alternatives and similar repositories for VITA
Users that are interested in VITA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆37Jul 9, 2024Updated last year
- This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"☆26Aug 24, 2023Updated 2 years ago
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Sep 3, 2025Updated 6 months ago
- ☆15Feb 1, 2026Updated last month
- real-to-sim evaluation suite for robot parkour☆11Jan 19, 2025Updated last year
- ☆19Jan 7, 2026Updated 2 months ago
- The official implement of "Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings"☆18Dec 5, 2024Updated last year
- The official implement of Freeze-Omni.☆15Jul 10, 2025Updated 8 months ago
- ✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model☆677May 24, 2025Updated 10 months ago
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆60Apr 14, 2025Updated 11 months ago
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆43Apr 10, 2025Updated 11 months ago
- LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))☆57Jun 9, 2025Updated 9 months ago
- The official implementation of the DIFFA series for dLLM-based large audio language model☆68Mar 12, 2026Updated last week
- FreeVA: Offline MLLM as Training-Free Video Assistant☆69Jun 9, 2024Updated last year
- ☆13Oct 3, 2024Updated last year
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆47Sep 19, 2025Updated 6 months ago
- ☆18Mar 4, 2024Updated 2 years ago
- ✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Vi…☆77Apr 28, 2025Updated 10 months ago
- This repository contains Reinforcement Learning (RL) environments for the Upkie robot.☆26Mar 11, 2026Updated last week
- Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)☆28Jun 6, 2025Updated 9 months ago
- ☆27Jan 30, 2024Updated 2 years ago
- ☆11Jun 21, 2025Updated 9 months ago
- ☆91May 10, 2024Updated last year
- ☆14Apr 25, 2025Updated 11 months ago
- ☆21Feb 29, 2024Updated 2 years ago
- ☆13May 17, 2025Updated 10 months ago
- ICASSP2026 HumDial Challenge☆36Dec 13, 2025Updated 3 months ago
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆62Jun 6, 2025Updated 9 months ago
- [ICCV 2025] The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs".☆22Oct 28, 2025Updated 4 months ago
- [ICCV 2025] VLM4D: Towards Spatiotemporal Awareness in Vision Language Models☆42Nov 20, 2025Updated 4 months ago
- ☆29Nov 4, 2025Updated 4 months ago
- Data generator for stereo sound event localization and detection task of DCASE 2025 challenge☆14Jul 17, 2025Updated 8 months ago
- M3GPT: An advanced multimodal, multitask framework for motion comprehension and generation.☆19Dec 12, 2024Updated last year
- NICE challenge 2023 Track2 2nd result(total 4th) (CVPR 2023) sponsered by LG AI/Shutterstock/SNU☆11Jun 22, 2023Updated 2 years ago
- An operation trying to do the opposite of F.grid_sample☆20Aug 8, 2023Updated 2 years ago
- ☆18May 27, 2025Updated 9 months ago
- [TPAMI2025] BackMix: Regularizing Open Set Recognition by Removing Underlying Fore-Background Priors☆15Apr 23, 2025Updated 11 months ago
- ☆51Updated this week
- Evolutionary-Algorithm and Large-Language-Model☆22Nov 5, 2024Updated last year