chancharikmitra / SAVsLinks

Official Codebase for "Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers"

☆17

Alternatives and similar repositories for SAVs

Users that are interested in SAVs are comparing it to the libraries listed below

Sorting:

maifoundations / Visionary-R1
Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
☆40Updated 3 months ago
MME-Benchmarks / MME-RealWorld
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆134Updated this week
palchenli / VL-Instruction-Tuning
☆91Updated last year
kxfan2002 / SophiaVL-R1
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
☆85Updated 2 months ago
Shengcao-Cao / groundLMM
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆40Updated last week
OpenGVLab / LCL
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
☆70Updated 8 months ago
FeipengMa6 / VLoRA
[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆52Updated 6 months ago
opendatalab / HA-DPO
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
☆95Updated last year
alibaba / conv-llava
☆119Updated last year
JieShibo / MemVP
[ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
☆50Updated last year
si0wang / VisVM
☆45Updated 9 months ago
yfzhang114 / LLaVA-Align
[ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…
☆82Updated 8 months ago
heliossun / SQ-LLaVA
Visual self-questioning for large vision-language assistant.
☆45Updated 3 months ago
BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated last year
OpenGVLab / Mono-InternVL
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
☆87Updated 3 months ago
yuecao0119 / MMInstruct
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆59Updated 11 months ago
RifleZhang / LLaVA-Reasoner-DPO
☆95Updated 9 months ago
findalexli / mllm-dpo
[ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model
☆47Updated 11 months ago
wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆104Updated 4 months ago
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆166Updated last year
UCSC-VLAA / VLAA-Thinking
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
☆136Updated 2 weeks ago
ant-research / DreamLIP
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆136Updated 5 months ago
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
callsys / ControlCap
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆79Updated last year
UMass-Embodied-AGI / FlexAttention
[ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models
☆44Updated 9 months ago
Lackel / AGLA
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆49Updated last year
hasanar1f / HiRED
[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…
☆41Updated 6 months ago
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆179Updated last year
Yuqifan1117 / HalluciDoctor
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆48Updated last year
wuw2019 / LoTLIP
[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
☆45Updated 9 months ago