The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.
☆160Oct 28, 2025Updated 8 months ago
Alternatives and similar repositories for VITA
Users that are interested in VITA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆32Jul 29, 2024Updated last year
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Sep 3, 2025Updated 10 months ago
- MelGAN and Tacotron 2 in PyTorch☆11Oct 22, 2019Updated 6 years ago
- ☆18Feb 1, 2026Updated 5 months ago
- real-to-sim evaluation suite for robot parkour☆11Jan 19, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- Code release for "Weakly Supervised Open-Vocabulary Object Detection", AAAI2024☆36Sep 9, 2024Updated last year
- [MM2024 Oral] 3D-GRES: Generalized 3D Referring Expression Segmentation☆43Dec 15, 2024Updated last year
- ✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model☆681May 24, 2025Updated last year
- FreeVA: Offline MLLM as Training-Free Video Assistant☆69Jun 9, 2024Updated 2 years ago
- Implementation of papers in 101 lines of code.☆18Nov 12, 2023Updated 2 years ago
- LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))☆61Jun 9, 2025Updated last year
- ☆18Mar 4, 2024Updated 2 years ago
- [AAAI 2026 & ACL 2026] The official implementation of the DIFFA series for dLLM-based large audio language model☆82Apr 7, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆11Jun 21, 2025Updated last year
- [CVPR'26] UniGame code implementation☆20Apr 21, 2026Updated 2 months ago
- A Guide for Modding a RTX 3070 to 16 GB VRAM☆67Nov 30, 2025Updated 7 months ago
- Official implementation of Dexterity from Smart Lenses Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations. Project w…☆57Dec 26, 2025Updated 6 months ago
- ☆15Apr 25, 2025Updated last year
- This repository contains Reinforcement Learning (RL) environments for the Upkie robot.☆31May 26, 2026Updated last month
- ICASSP2026 HumDial Challenge☆48May 28, 2026Updated last month
- ☆21Feb 29, 2024Updated 2 years ago
- ☆13May 17, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆42Mar 31, 2026Updated 3 months ago
- NICE challenge 2023 Track2 2nd result(total 4th) (CVPR 2023) sponsered by LG AI/Shutterstock/SNU☆11Jun 22, 2023Updated 3 years ago
- Extension to `F.grid_sample` that allows using batch index per grid point.☆19Jun 27, 2023Updated 3 years ago
- An operation trying to do the opposite of F.grid_sample☆20Aug 8, 2023Updated 2 years ago
- ☆21Jun 4, 2026Updated last month
- [TPAMI2025] BackMix: Regularizing Open Set Recognition by Removing Underlying Fore-Background Priors☆16Apr 23, 2025Updated last year
- ☆63Jun 23, 2026Updated last week
- ☆11Aug 12, 2014Updated 11 years ago
- [ICCV 2025] The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs".☆24Oct 28, 2025Updated 8 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆17May 18, 2024Updated 2 years ago
- ☆46Aug 17, 2024Updated last year
- ☆18Aug 17, 2025Updated 10 months ago
- ACM MM 2022 - PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding☆11Aug 12, 2022Updated 3 years ago
- ✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction☆2,520Mar 28, 2025Updated last year
- ☆33Sep 21, 2024Updated last year
- ☆23Jan 8, 2024Updated 2 years ago