FreeVA: Offline MLLM as Training-Free Video Assistant
☆69Jun 9, 2024Updated last year
Alternatives and similar repositories for FreeVA
Users that are interested in FreeVA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"☆26Aug 24, 2023Updated 2 years ago
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆22Sep 26, 2024Updated last year
- 【NeurIPS 2024】Dense Connector for MLLMs☆183Oct 14, 2024Updated last year
- Long Context Transfer from Language to Vision☆403Mar 18, 2025Updated last year
- ☆32Jul 29, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [AAAI 2024] The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Refer…☆45Dec 20, 2023Updated 2 years ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆44Mar 11, 2025Updated last year
- ☆19Jan 7, 2026Updated 4 months ago
- Matryoshka Multimodal Models☆123Jan 22, 2025Updated last year
- ☆157Oct 31, 2024Updated last year
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆86Oct 25, 2024Updated last year
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆49Jan 9, 2024Updated 2 years ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆57Mar 9, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision☆44Oct 19, 2025Updated 6 months ago
- ☆107Jul 30, 2024Updated last year
- ☆359Jan 27, 2024Updated 2 years ago
- ☆83Oct 31, 2024Updated last year
- PyTorch code for the CVPR'23 paper: "ConStruct-VL: Data-Free Continual Structured VL Concepts Learning"☆13Feb 5, 2024Updated 2 years ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆213Jan 6, 2025Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆282Jun 25, 2024Updated last year
- ☆42Apr 7, 2024Updated 2 years ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆167Mar 8, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- The official repository of "Video assistant towards large language model makes everything easy"☆232Dec 24, 2024Updated last year
- ☆14Apr 25, 2025Updated last year
- ☆18Jul 10, 2024Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Apr 2, 2025Updated last year
- survery of small language models☆18Jul 23, 2024Updated last year
- ☆13Apr 13, 2026Updated 3 weeks ago
- ☆81Nov 24, 2024Updated last year
- Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos☆70Sep 5, 2025Updated 8 months ago
- Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning☆28Oct 30, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆18Apr 4, 2025Updated last year
- [ICML 2025] Official PyTorch implementation of LongVU☆424May 8, 2025Updated last year
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆67Feb 19, 2025Updated last year
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆21Feb 27, 2025Updated last year
- team Doggeee's solution to Ego4D LTA challenge@CVPRW23'☆14Nov 4, 2023Updated 2 years ago
- [NeurIPS 2024 Oral] RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation☆19Dec 22, 2024Updated last year
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆339Jul 17, 2024Updated last year