Jingfeng0705 / LIFTLinks

The official repo for LIFT: Language-Image Alignment with Fixed Text Encoders

☆34

Alternatives and similar repositories for LIFT

Users that are interested in LIFT are comparing it to the libraries listed below

Sorting:

markywg / transagent
[NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
☆24Updated 9 months ago
iancovert / locality-alignment
☆51Updated 6 months ago
Haochen-Wang409 / TreeVGR
Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"
☆49Updated 3 weeks ago
LINs-lab / GMem
[Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models
☆39Updated 4 months ago
techmonsterwang / iLLaMA
Adapting LLaMA Decoder to Vision Transformer
☆29Updated last year
yu-rp / Dimple
Dimple, the first Discrete Diffusion Multimodal Large Language Model
☆85Updated last month
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆85Updated 10 months ago
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆57Updated 9 months ago
om-ai-lab / ZoomEye
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆47Updated 7 months ago
zhijie-group / UniCMs
☆37Updated 2 months ago
EmmaSRH / ARVFM
Awesome autoregressive vision foundation models
☆25Updated 7 months ago
ByteDance-Seed / SAIL
Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"
☆58Updated 2 weeks ago
RenShuhuai-Andy / NBP
Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
☆38Updated 5 months ago
OpenGVLab / Multitask-Model-Selector
[NIPS2023]Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector
☆37Updated last year
SEU-VIPGroup / Understanding_Vision_Tasks
☆12Updated 6 months ago
inclusionAI / M2-Reasoning
M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning
☆35Updated 3 weeks ago
microsoft / x-reasoner
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
☆47Updated 3 months ago
m1k2zoo / negbench
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆27Updated 3 months ago
locuslab / llava-token-compression
☆43Updated 9 months ago
hulianyuyy / iLLaVA
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models
☆19Updated 6 months ago
eric-ai-lab / GRIT
Official code for paper "GRIT: Teaching MLLMs to Think with Images"
☆115Updated this week
Mr-Loevan / FAST
Fast-Slow Thinking for Large Vision-Language Model Reasoning
☆17Updated 3 months ago
OpenGVLab / PVC
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
☆45Updated last month
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆32Updated 5 months ago
wjpoom / SPEC
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
☆45Updated last month
hammoudhasan / DiffCLIP
Official Implementation of DiffCLIP: Differential Attention Meets CLIP
☆38Updated 4 months ago
UMass-Embodied-AGI / FlexAttention
[ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models
☆41Updated 7 months ago
OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆54Updated 2 weeks ago
OpenSparseLLMs / CLIP-MoE
CLIP-MoE: Mixture of Experts for CLIP
☆42Updated 9 months ago
sjz5202 / LLaVA-Reward
Official repository for LLaVA-Reward (ICCV 2025): Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
☆14Updated last week