The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.
☆155Oct 28, 2025Updated 6 months ago
Alternatives and similar repositories for VITA
Users that are interested in VITA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆38Jul 9, 2024Updated last year
- This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"☆26Aug 24, 2023Updated 2 years ago
- ☆32Jul 29, 2024Updated last year
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Sep 3, 2025Updated 8 months ago
- [NeurIPS 2024 Oral] RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation☆19Dec 22, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- MelGAN and Tacotron 2 in PyTorch☆11Oct 22, 2019Updated 6 years ago
- ☆16Feb 1, 2026Updated 3 months ago
- real-to-sim evaluation suite for robot parkour☆11Jan 19, 2025Updated last year
- Onset-and-Offset-Aware Sound Event Detection☆22Feb 10, 2025Updated last year
- [AAAI 2024] The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Refer…☆45Dec 20, 2023Updated 2 years ago
- ☆19Jan 7, 2026Updated 3 months ago
- The official implement of "Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings"☆18Dec 5, 2024Updated last year
- Code release for "Weakly Supervised Open-Vocabulary Object Detection", AAAI2024☆35Sep 9, 2024Updated last year
- [MM2024 Oral] 3D-GRES: Generalized 3D Referring Expression Segmentation☆42Dec 15, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model☆679May 24, 2025Updated 11 months ago
- Official implementation of "OpenCity3D: What do Vision-Language Models know about Urban Environments?" @ WACV2025☆17Nov 24, 2024Updated last year
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆60Apr 14, 2025Updated last year
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆42Apr 10, 2025Updated last year
- LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))☆58Jun 9, 2025Updated 10 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆69Jun 9, 2024Updated last year
- Implementation of papers in 101 lines of code.☆18Nov 12, 2023Updated 2 years ago
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆47Sep 19, 2025Updated 7 months ago
- ☆18Mar 4, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [AAAI 2026 & ACL 2026] The official implementation of the DIFFA series for dLLM-based large audio language model☆79Apr 7, 2026Updated 3 weeks ago
- [CVPR2025] FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression☆65Oct 10, 2025Updated 6 months ago
- ☆27Jan 30, 2024Updated 2 years ago
- ☆11Jun 21, 2025Updated 10 months ago
- [CVPR'26] UniGame code implementation☆19Apr 21, 2026Updated last week
- Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)☆29Jun 6, 2025Updated 10 months ago
- ☆90May 10, 2024Updated last year
- ☆14Apr 25, 2025Updated last year
- ☆21Feb 29, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆13May 17, 2025Updated 11 months ago
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆62Jun 6, 2025Updated 10 months ago
- Self-supervised algorithm for learning representations from ego-centric video data. Code is tested on EPIC-Kitchens-100 and Ego4D in PyTo…☆13Oct 23, 2022Updated 3 years ago
- Data generator for stereo sound event localization and detection task of DCASE 2025 challenge☆16Jul 17, 2025Updated 9 months ago
- ☆35Mar 31, 2026Updated last month
- Differentiable non-uniform interpolation: https://arxiv.org/abs/2012.13257☆11Oct 3, 2021Updated 4 years ago
- M3GPT: An advanced multimodal, multitask framework for motion comprehension and generation.☆20Dec 12, 2024Updated last year