ziqipang / LM4VisualEncodingLinks

[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"

☆245

Alternatives and similar repositories for LM4VisualEncoding

Users that are interested in LM4VisualEncoding are comparing it to the libraries listed below

Sorting:

tsb0601 / MMVP
☆355Updated last year
bronyayang / Law_of_Vision_Representation_in_MLLMs
[COLM'25] Official implementation of the Law of Vision Representation in MLLMs
☆170Updated last month
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆147Updated last year
LijieFan / LaCLIP
[NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"
☆286Updated last year
isekai-portal / Link-Context-Learning
☆100Updated last year
MMStar-Benchmark / MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆201Updated last year
baaivision / EVE
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆361Updated 4 months ago
Chenyu-Wang567 / MLLM-Tool
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
☆134Updated last month
42Shawn / LLaVA-PruMerge
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
☆156Updated 2 months ago
amazon-science / QA-ViT
☆69Updated last year
dvlab-research / Prompt-Highlighter
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
☆155Updated last year
ggjy / DeLVM
☆120Updated last year
PKU-YuanGroup / Video-Bench
A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!
☆135Updated last year
yuhui-zh15 / VLMClassifier
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆92Updated last year
Yxxxb / VoCo-LLaMA
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆195Updated 5 months ago
yaolinli / DeCo
Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models
☆74Updated 4 months ago
facebookresearch / DCI
Densely Captioned Images (DCI) dataset repository.
☆194Updated last year
ZhangYuanhan-AI / NOAH
[TPAMI] Searching prompt modules for parameter-efficient transfer learning.
☆238Updated last year
x-cls / superclass
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
☆218Updated 8 months ago
BAAI-DCAI / Visual-Instruction-Tuning
SVIT: Scaling up Visual Instruction Tuning
☆164Updated last year
WisconsinAIVision / ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆331Updated last year
zhyang2226 / OPA-DPO
[CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
☆91Updated 2 months ago
Weixin-Liang / Modality-Gap
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning
☆167Updated 3 years ago
HKUST-LongGroup / Awesome-MLLM-Benchmarks
☆151Updated 9 months ago
Beckschen / LLaVolta
[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression
☆62Updated 9 months ago
TencentARC / ST-LLM
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
☆150Updated last year
OpenGVLab / LCL
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
☆70Updated 9 months ago
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆180Updated last year
baaivision / DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆158Updated 11 months ago
llyx97 / TempCompass
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆125Updated 8 months ago