contrastive/FreeVideoLLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/contrastive/FreeVideoLLM)

contrastive / FreeVideoLLM

☆83

Alternatives and similar repositories for FreeVideoLLM

Users that are interested in FreeVideoLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

apple / ml-slowfast-llava
View on GitHub
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
☆291Sep 16, 2024Updated last year
tingyu215 / TS-LLaVA
View on GitHub
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
☆17Jan 2, 2025Updated last year
mbzuai-oryx / Video-CoM
View on GitHub
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
☆22Jun 17, 2026Updated last month
RenShuhuai-Andy / TESTA
View on GitHub
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
☆50Jan 9, 2024Updated 2 years ago
imagegridworth / IG-VLM
View on GitHub
☆138Sep 29, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
whwu95 / FreeVA
View on GitHub
FreeVA: Offline MLLM as Training-Free Video Assistant
☆69Jun 9, 2024Updated 2 years ago
Cooperx521 / PyramidDrop
View on GitHub
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
☆151Mar 6, 2025Updated last year
markywg / transagent
View on GitHub
[NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
☆25Oct 17, 2024Updated last year
ParadoxZW / LLaVA-UHD-Better
View on GitHub
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
☆35Aug 12, 2024Updated last year
mlvlab / VidChain
View on GitHub
Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Capti…
☆25Jan 26, 2025Updated last year
TimeMarker-LLM / TimeMarker
View on GitHub
A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
☆107Nov 28, 2024Updated last year
yellow-binary-tree / HawkEye
View on GitHub
Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos
☆47Apr 29, 2024Updated 2 years ago
SUSTechBruce / LOOK-M
View on GitHub
[EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…
☆103Nov 9, 2024Updated last year
sosppxo / 3D-STMN
View on GitHub
[AAAI 2024] The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Refer…
☆45Dec 20, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
williamium3000 / awesome-mllm-grounding
View on GitHub
Awesome paper for multi-modal llm with grounding ability
☆21Oct 11, 2025Updated 9 months ago
zihuixue / MKE
View on GitHub
[ICCV 2021] Multimodal Knowledge Expansion
☆10Aug 28, 2021Updated 4 years ago
TerryPei / CSP
View on GitHub
Cross-Self KV Cache Pruning for Efficient Vision-Language Inference
☆10Dec 15, 2024Updated last year
dongyh20 / Insight-V
View on GitHub
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆240Nov 7, 2025Updated 8 months ago
jongwoopark7978 / LVNet
View on GitHub
[Main Conference @ EACL'26] [Workshop @ NeurIPS'24] 🎞️ LVNet.
☆44Feb 10, 2026Updated 5 months ago
mlvlab / VT-TWINS
View on GitHub
Video-Text Representation Learning via Differentiable Weak Temporal Alignment (CVPR 2022)
☆19Apr 19, 2024Updated 2 years ago
rccchoudhury / rlt
View on GitHub
Official Implementation for our NeurIPS 2024 paper, "Don't Look Twice: Run-Length Tokenization for Faster Video Transformers".
☆238Mar 29, 2025Updated last year
hulianyuyy / iLLaVA
View on GitHub
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models (ICLR2026)
☆23Jun 24, 2026Updated last month
takomc / amp
View on GitHub
【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"
☆22Sep 26, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
NeverMoreLCH / SearchLVLMs
View on GitHub
Repository for the NeurIPS 2024 paper "SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up…
☆25Dec 9, 2024Updated last year
strongwolf / CDG
View on GitHub
☆11Jan 20, 2022Updated 4 years ago
zhangquanchen / VisRL
View on GitHub
[ICCV 2025] VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
☆47Nov 8, 2025Updated 8 months ago
mrwu-mac / ControlMLLM
View on GitHub
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆211Jul 17, 2025Updated last year
baopj / E3M
View on GitHub
[ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.
☆11Jul 16, 2024Updated 2 years ago
songw-zju / PixelThink
View on GitHub
The official implementation of "PixelThink: Towards Efficient Chain-of-Pixel Reasoning" (ICML 2026)
☆43Jul 4, 2026Updated 3 weeks ago
yihedeng9 / STIC
View on GitHub
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆68May 31, 2024Updated 2 years ago
gyxxyg / VTG-LLM
View on GitHub
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
☆130Dec 10, 2024Updated last year
bronyayang / HallE_Control
View on GitHub
HallE-Control: Controlling Object Hallucination in LMMs
☆32Apr 10, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
IVGSZ / Flash-VStream
View on GitHub
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
☆287Oct 15, 2025Updated 9 months ago
gls0425 / LinVT
View on GitHub
LinVT: Empower Your Image-level Large Language Model to Understand Videos
☆84Dec 30, 2024Updated last year
lyongo / NWPU-MOC
View on GitHub
This is a repository about NWPU-MOC dataset and code.
☆23Jan 24, 2024Updated 2 years ago
TencentARC / TaCA
View on GitHub
Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".
☆16Jun 20, 2023Updated 3 years ago
showlab / VideoLISA
View on GitHub
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
☆148Dec 26, 2024Updated last year
lzhxmu / VTW
View on GitHub
Code release for VTW (AAAI 2025 Oral)
☆68Nov 4, 2025Updated 8 months ago
HighwayWu / ST-DDL
View on GitHub
☆16Feb 27, 2025Updated last year