ziqipang / LM4VisualEncodingLinks
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
☆238Updated last year
Alternatives and similar repositories for LM4VisualEncoding
Users that are interested in LM4VisualEncoding are comparing it to the libraries listed below
Sorting:
- ☆337Updated last year
- Official implementation of the Law of Vision Representation in MLLMs☆156Updated 7 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆143Updated 7 months ago
- Open source implementation of "Vision Transformers Need Registers"☆182Updated 2 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents☆314Updated last year
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆169Updated last week
- SVIT: Scaling up Visual Instruction Tuning☆163Updated last year
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆134Updated last year
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆85Updated 8 months ago
- official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"☆215Updated 3 weeks ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆125Updated last year
- [NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"☆282Updated last year
- Densely Captioned Images (DCI) dataset repository.☆185Updated 11 months ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆209Updated last year
- A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!☆128Updated last year
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆209Updated 3 months ago
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆182Updated 8 months ago
- [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆69Updated 4 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆131Updated last month
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆263Updated 11 months ago
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning☆280Updated last year
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆331Updated 6 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆179Updated 3 weeks ago
- ☆277Updated 2 years ago
- ☆66Updated 11 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆51Updated last year
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆199Updated 5 months ago
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆109Updated 3 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆117Updated 2 months ago
- Explore the Limits of Omni-modal Pretraining at Scale☆103Updated 9 months ago