The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.
☆148Oct 28, 2025Updated 4 months ago
Alternatives and similar repositories for VITA
Users that are interested in VITA are comparing it to the libraries listed below
Sorting:
- ☆37Jul 9, 2024Updated last year
- This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"☆26Aug 24, 2023Updated 2 years ago
- Code release for "Weakly Supervised Open-Vocabulary Object Detection", AAAI2024☆35Sep 9, 2024Updated last year
- real-to-sim evaluation suite for robot parkour☆11Jan 19, 2025Updated last year
- MelGAN and Tacotron 2 in PyTorch☆11Oct 22, 2019Updated 6 years ago
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Sep 3, 2025Updated 6 months ago
- ☆32Jul 29, 2024Updated last year
- ☆18Mar 4, 2024Updated 2 years ago
- [AAAI 2025] Does VLM Classification Benefit from LLM Description Semantics?☆25Aug 5, 2025Updated 6 months ago
- Implementation of papers in 101 lines of code.☆18Nov 12, 2023Updated 2 years ago
- The official implement of Freeze-Omni.☆15Jul 10, 2025Updated 7 months ago
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆43Apr 10, 2025Updated 10 months ago
- Extension to `F.grid_sample` that allows using batch index per grid point.☆19Jun 27, 2023Updated 2 years ago
- ☆17Aug 7, 2024Updated last year
- An operation trying to do the opposite of F.grid_sample☆20Aug 8, 2023Updated 2 years ago
- [NeurIPS 2024 Oral] RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation☆19Dec 22, 2024Updated last year
- [MM2024 Oral] 3D-GRES: Generalized 3D Referring Expression Segmentation☆42Dec 15, 2024Updated last year
- ☆19Jan 7, 2026Updated last month
- ☆21Feb 29, 2024Updated 2 years ago
- M3GPT: An advanced multimodal, multitask framework for motion comprehension and generation.☆19Dec 12, 2024Updated last year
- [AAAI 2024] The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Refer…☆44Dec 20, 2023Updated 2 years ago
- Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)☆26Jun 6, 2025Updated 8 months ago
- ☆23Jan 8, 2024Updated 2 years ago
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆59Apr 14, 2025Updated 10 months ago
- [NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆41Oct 14, 2025Updated 4 months ago
- Modality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models☆51Feb 23, 2026Updated last week
- Automatic Integration for Neural Spatio-Temporal Point Process models (AI-STPP) is a new paradigm for exact, efficient, non-parametric inf…☆25Oct 14, 2024Updated last year
- The implementation for CIKM 2024: Towards Completeness-Oriented Tool Retrieval for Large Language Models.☆24Nov 6, 2024Updated last year
- ☆31Sep 21, 2024Updated last year
- Official Implementation (Pytorch) of "DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Represe…☆27Jun 24, 2024Updated last year
- ✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model☆673May 24, 2025Updated 9 months ago
- ✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Vi…☆77Apr 28, 2025Updated 10 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆69Jun 9, 2024Updated last year
- The official implementation of the DIFFA series for dLLM-based large audio language model☆59Feb 2, 2026Updated last month
- ☆32Dec 20, 2023Updated 2 years ago
- Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation☆31Mar 28, 2025Updated 11 months ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆32Jul 16, 2025Updated 7 months ago
- [ICCV 2025] Dynamic-VLM☆28Dec 16, 2024Updated last year
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆203Jun 18, 2025Updated 8 months ago