FreeVA: Offline MLLM as Training-Free Video Assistant
☆69Jun 9, 2024Updated last year
Alternatives and similar repositories for FreeVA
Users that are interested in FreeVA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"☆26Aug 24, 2023Updated 2 years ago
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆22Sep 26, 2024Updated last year
- 【NeurIPS 2024】Dense Connector for MLLMs☆182Oct 14, 2024Updated last year
- Long Context Transfer from Language to Vision☆403Mar 18, 2025Updated last year
- ☆32Jul 29, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [AAAI 2024] The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Refer…☆44Dec 20, 2023Updated 2 years ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆43Mar 11, 2025Updated last year
- ☆19Jan 7, 2026Updated 2 months ago
- Matryoshka Multimodal Models☆122Jan 22, 2025Updated last year
- ☆157Oct 31, 2024Updated last year
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆86Oct 25, 2024Updated last year
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆49Jan 9, 2024Updated 2 years ago
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision☆43Oct 19, 2025Updated 5 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆55Mar 9, 2025Updated last year
- ☆107Jul 30, 2024Updated last year
- ☆360Jan 27, 2024Updated 2 years ago
- ☆83Oct 31, 2024Updated last year
- PyTorch code for the CVPR'23 paper: "ConStruct-VL: Data-Free Continual Structured VL Concepts Learning"☆14Feb 5, 2024Updated 2 years ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆213Jan 6, 2025Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆281Jun 25, 2024Updated last year
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆166Mar 8, 2026Updated 2 weeks ago
- ☆42Apr 7, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- The official repository of "Video assistant towards large language model makes everything easy"☆232Dec 24, 2024Updated last year
- ☆14Apr 25, 2025Updated 11 months ago
- ☆18Jul 10, 2024Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Apr 2, 2025Updated 11 months ago
- survery of small language models☆18Jul 23, 2024Updated last year
- ☆13Aug 7, 2025Updated 7 months ago
- Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos☆69Sep 5, 2025Updated 6 months ago
- ☆80Nov 24, 2024Updated last year
- ☆18Apr 4, 2025Updated 11 months ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning☆28Oct 30, 2024Updated last year
- [ICML 2025] Official PyTorch implementation of LongVU☆424May 8, 2025Updated 10 months ago
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆67Feb 19, 2025Updated last year
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆21Feb 27, 2025Updated last year
- team Doggeee's solution to Ego4D LTA challenge@CVPRW23'☆13Nov 4, 2023Updated 2 years ago
- ☆23Jan 1, 2026Updated 2 months ago
- [NeurIPS 2024 Oral] RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation☆19Dec 22, 2024Updated last year