whwu95/FreeVA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/whwu95/FreeVA)

whwu95 / FreeVA

FreeVA: Offline MLLM as Training-Free Video Assistant

☆69

Alternatives and similar repositories for FreeVA

Users that are interested in FreeVA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

takomc / amp
View on GitHub
【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"
☆22Sep 26, 2024Updated last year
Leon1207 / 3DRefTR
View on GitHub
This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"
☆26Aug 24, 2023Updated 2 years ago
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
EvolvingLMMs-Lab / LongVA
View on GitHub
Long Context Transfer from Language to Vision
☆407Mar 18, 2025Updated last year
Share14 / ShareGemini
View on GitHub
☆32Jul 29, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
LaVi-Lab / Visual-Table
View on GitHub
[EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"
☆20Oct 17, 2024Updated last year
sosppxo / 3D-STMN
View on GitHub
[AAAI 2024] The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Refer…
☆45Dec 20, 2023Updated 2 years ago
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆46Mar 11, 2025Updated last year
YouHuang67 / mamba-code-explained
View on GitHub
☆19Jan 7, 2026Updated 6 months ago
mu-cai / matryoshka-mm
View on GitHub
Matryoshka Multimodal Models
☆123Jan 22, 2025Updated last year
RifleZhang / LLaVA-Hound-DPO
View on GitHub
☆158Oct 31, 2024Updated last year
Hon-Wong / Elysium
View on GitHub
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
☆89Oct 25, 2024Updated last year
RenShuhuai-Andy / TESTA
View on GitHub
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
☆50Jan 9, 2024Updated 2 years ago
CASIA-IVA-Lab / VideoNIAH
View on GitHub
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆57Mar 9, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Shengcao-Cao / groundLMM
View on GitHub
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆47Oct 19, 2025Updated 9 months ago
ziplab / LongVLM
View on GitHub
☆108Jul 30, 2024Updated last year
tsb0601 / MMVP
View on GitHub
☆365Jan 27, 2024Updated 2 years ago
contrastive / FreeVideoLLM
View on GitHub
☆83Oct 31, 2024Updated last year
jamessealesmith / ConStruct-VL
View on GitHub
PyTorch code for the CVPR'23 paper: "ConStruct-VL: Data-Free Continual Structured VL Concepts Learning"
☆13Feb 5, 2024Updated 2 years ago
FreedomIntelligence / LongLLaVA
View on GitHub
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆211Jan 6, 2025Updated last year
42Shawn / LLaVA-PruMerge
View on GitHub
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
☆173Mar 8, 2026Updated 4 months ago
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated 2 years ago
HJYao00 / Side4Video
View on GitHub
☆42Apr 7, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
RupertLuo / Valley
View on GitHub
The official repository of "Video assistant towards large language model makes everything easy"
☆232Dec 24, 2024Updated last year
showlab / VisInContext
View on GitHub
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
☆28Oct 30, 2024Updated last year
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
dengandong / GroundMoRe
View on GitHub
☆18May 18, 2026Updated 2 months ago
DCDmllm / Momentor
View on GitHub
☆81Nov 24, 2024Updated last year
OpenCSGs / Awesome-SLMs
View on GitHub
survery of small language models
☆18Jul 23, 2024Updated 2 years ago
NVlabs / FRAG
View on GitHub
☆15Apr 25, 2025Updated last year
Vision-CAIR / LongVU
View on GitHub
[ICML 2025] Official PyTorch implementation of LongVU
☆431May 8, 2025Updated last year
HJYao00 / MMReason
View on GitHub
[ICCV 2025] MMReason, MLLMs, step by step, reasoning benchmark, AGI
☆15Apr 25, 2026Updated 3 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
DanDoge / Palm
View on GitHub
team Doggeee's solution to Ego4D LTA challenge@CVPRW23'
☆14Nov 4, 2023Updated 2 years ago
Vinoground / Vinoground
View on GitHub
☆13Apr 13, 2026Updated 3 months ago
TIGER-AI-Lab / VISTA
View on GitHub
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆20Feb 27, 2025Updated last year
Beckschen / LLaVolta
View on GitHub
[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression
☆66Feb 19, 2025Updated last year
sosppxo / RG-SAN
View on GitHub
[NeurIPS 2024 Oral] RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
☆20Dec 22, 2024Updated last year
FatemehShiri / Spatial-MM
View on GitHub
☆12Jan 10, 2025Updated last year
WisconsinAIVision / ViP-LLaVA
View on GitHub
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆339Jul 17, 2024Updated 2 years ago