The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.
☆158Oct 28, 2025Updated 7 months ago
Alternatives and similar repositories for VITA
Users that are interested in VITA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"☆26Aug 24, 2023Updated 2 years ago
- ☆32Jul 29, 2024Updated last year
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Sep 3, 2025Updated 9 months ago
- [NeurIPS 2024 Oral] RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation☆19Dec 22, 2024Updated last year
- ☆18Feb 1, 2026Updated 4 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- [AAAI 2024] The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Refer…☆45Dec 20, 2023Updated 2 years ago
- ☆19Jan 7, 2026Updated 5 months ago
- [MM2024 Oral] 3D-GRES: Generalized 3D Referring Expression Segmentation☆43Dec 15, 2024Updated last year
- The official implement of Freeze-Omni.☆15Jul 10, 2025Updated 11 months ago
- Official implementation of "OpenCity3D: What do Vision-Language Models know about Urban Environments?" @ WACV2025☆18Nov 24, 2024Updated last year
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆60Apr 14, 2025Updated last year
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆42Apr 10, 2025Updated last year
- FreeVA: Offline MLLM as Training-Free Video Assistant☆69Jun 9, 2024Updated 2 years ago
- Implementation of papers in 101 lines of code.☆18Nov 12, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))☆61Jun 9, 2025Updated last year
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆47Sep 19, 2025Updated 8 months ago
- [AAAI 2026 & ACL 2026] The official implementation of the DIFFA series for dLLM-based large audio language model☆81Apr 7, 2026Updated 2 months ago
- The official repository TimeAudio, a comprehensive framework that incorporates fine-grained acoustic cues into LALMs with enhanced module…☆29Nov 18, 2025Updated 6 months ago
- ✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Vi…☆78Apr 28, 2025Updated last year
- ☆11Jun 21, 2025Updated 11 months ago
- Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)☆29Jun 6, 2025Updated last year
- A Guide for Modding a RTX 3070 to 16 GB VRAM☆63Nov 30, 2025Updated 6 months ago
- Official implementation of Dexterity from Smart Lenses Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations. Project w…☆54Dec 26, 2025Updated 5 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆90May 10, 2024Updated 2 years ago
- ☆15Apr 25, 2025Updated last year
- This repository contains Reinforcement Learning (RL) environments for the Upkie robot.☆31May 26, 2026Updated 2 weeks ago
- ICASSP2026 HumDial Challenge☆47May 28, 2026Updated 2 weeks ago
- ☆13May 17, 2025Updated last year
- ☆31Nov 4, 2025Updated 7 months ago
- Data generator for stereo sound event localization and detection task of DCASE 2025 challenge☆17Jul 17, 2025Updated 10 months ago
- ☆39Mar 31, 2026Updated 2 months ago
- Extension to `F.grid_sample` that allows using batch index per grid point.☆19Jun 27, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- An operation trying to do the opposite of F.grid_sample☆20Aug 8, 2023Updated 2 years ago
- ☆21Jun 4, 2026Updated last week
- [TPAMI2025] BackMix: Regularizing Open Set Recognition by Removing Underlying Fore-Background Priors☆16Apr 23, 2025Updated last year
- Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation☆32Mar 28, 2025Updated last year
- ☆61Jun 5, 2026Updated last week
- [ICCV 2025] The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs".☆23Oct 28, 2025Updated 7 months ago
- ☆10Oct 18, 2024Updated last year