Hao840 / ADEM-VLLinks

PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"

☆20

Alternatives and similar repositories for ADEM-VL

Users that are interested in ADEM-VL are comparing it to the libraries listed below

Sorting:

SHI-Labs / VisPer-LM
[NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation, arXiv 2024
☆64Updated last month
OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆62Updated 4 months ago
kkyuhun94 / dalda
[ECCV'24 Workshops Oral] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling
☆30Updated last year
OpenGVLab / FluxViT
Make Your Training Flexible: Towards Deployment-Efficient Video Models
☆34Updated 5 months ago
mbzuai-oryx / VideoMolmo
Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"
☆54Updated 4 months ago
top-yun / SPARK
A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.
☆19Updated 10 months ago
MCR-PEFT / Ex-MCR
☆45Updated 6 months ago
kyegomez / MC-ViT
Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"
☆24Updated 3 weeks ago
RenShuhuai-Andy / TESTA
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
☆49Updated last year
NVlabs / STL
Official Pytorch Implementation of Self-emerging Token Labeling
☆35Updated last year
xinghaochen / SqueezeTime
Official PyTorch implementation of "No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding"
☆32Updated last year
Gahyeonkim09 / AAPL
AAPL: Adding Attributes to Prompt Learning for Vision-Language Models (CVPRw 2024)
☆34Updated last year
WeihuangLin / INF-LLaVA
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
☆42Updated last year
m1k2zoo / negbench
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆38Updated 7 months ago
Yui010206 / CREMA
[ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
☆54Updated 4 months ago
XiaoduoAILab / XmodelVLM
☆69Updated last year
mbzuai-oryx / Agent-X
Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
☆32Updated last week
FreedomIntelligence / LongLLaVA
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆211Updated 10 months ago
AtsuMiyai / UPD
[ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
☆78Updated 5 months ago
Christina200 / Online-LoRA-official
[WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…
☆51Updated 2 months ago
jihaonew / MM-Instruct
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Updated last year
sunsmarterjie / ChatterBox
[AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues
☆58Updated 6 months ago
tulip-berkeley / open_clip
An open source implementation of CLIP (With TULIP Support)
☆163Updated 6 months ago
qiuzh20 / RMoE
Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)
☆27Updated last year
MacavityT / REF-VLM
☆30Updated 8 months ago
yihedeng9 / OpenVLThinker
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆119Updated 3 months ago
mfarre / Video-LLaVA-7B-hf-CinePile
Video-LlaVA fine-tune for CinePile evaluation
☆51Updated last year
google-deepmind / tips
☆106Updated 7 months ago
DCDmllm / MorphTokens
☆43Updated last year
HaroldChen19 / VistaDPO
[ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
☆36Updated 5 months ago