The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.
☆157Oct 28, 2025Updated 6 months ago
Alternatives and similar repositories for VITA
Users that are interested in VITA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"☆26Aug 24, 2023Updated 2 years ago
- ☆32Jul 29, 2024Updated last year
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Sep 3, 2025Updated 8 months ago
- [NeurIPS 2024 Oral] RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation☆19Dec 22, 2024Updated last year
- MelGAN and Tacotron 2 in PyTorch☆11Oct 22, 2019Updated 6 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆17Feb 1, 2026Updated 3 months ago
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- ☆19Jan 7, 2026Updated 4 months ago
- The official implement of "Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings"☆18Dec 5, 2024Updated last year
- Code release for "Weakly Supervised Open-Vocabulary Object Detection", AAAI2024☆35Sep 9, 2024Updated last year
- [MM2024 Oral] 3D-GRES: Generalized 3D Referring Expression Segmentation☆42Dec 15, 2024Updated last year
- The official implement of Freeze-Omni.☆15Jul 10, 2025Updated 10 months ago
- ✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model☆680May 24, 2025Updated last year
- Official implementation of "OpenCity3D: What do Vision-Language Models know about Urban Environments?" @ WACV2025☆17Nov 24, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆60Apr 14, 2025Updated last year
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆43Apr 10, 2025Updated last year
- FreeVA: Offline MLLM as Training-Free Video Assistant☆69Jun 9, 2024Updated last year
- LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))☆58Jun 9, 2025Updated 11 months ago
- ☆18Mar 4, 2024Updated 2 years ago
- [AAAI 2026 & ACL 2026] The official implementation of the DIFFA series for dLLM-based large audio language model☆79Apr 7, 2026Updated last month
- ✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Vi…☆78Apr 28, 2025Updated last year
- [CVPR2025] FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression☆65Oct 10, 2025Updated 7 months ago
- ☆27Jan 30, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆11Jun 21, 2025Updated 11 months ago
- ☆90May 10, 2024Updated 2 years ago
- ☆14Apr 25, 2025Updated last year
- ☆21Feb 29, 2024Updated 2 years ago
- ☆13May 17, 2025Updated last year
- Self-supervised algorithm for learning representations from ego-centric video data. Code is tested on EPIC-Kitchens-100 and Ego4D in PyTo…☆13Oct 23, 2022Updated 3 years ago
- ☆37Mar 31, 2026Updated last month
- Differentiable non-uniform interpolation: https://arxiv.org/abs/2012.13257☆11Oct 3, 2021Updated 4 years ago
- NICE challenge 2023 Track2 2nd result(total 4th) (CVPR 2023) sponsered by LG AI/Shutterstock/SNU☆11Jun 22, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Extension to `F.grid_sample` that allows using batch index per grid point.☆19Jun 27, 2023Updated 2 years ago
- An operation trying to do the opposite of F.grid_sample☆20Aug 8, 2023Updated 2 years ago
- Code for PromptNet☆16Jan 29, 2025Updated last year
- ☆21May 7, 2026Updated 2 weeks ago
- [TPAMI2025] BackMix: Regularizing Open Set Recognition by Removing Underlying Fore-Background Priors☆16Apr 23, 2025Updated last year
- Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation☆32Mar 28, 2025Updated last year
- ☆58Updated this week