ShyFoo / NemesisLinks

Official implementation of Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models (ICLR 2024 Spotlight)

☆13

Alternatives and similar repositories for Nemesis

Users that are interested in Nemesis are comparing it to the libraries listed below

Sorting:

2toinf / IVM
[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"
☆38Updated 9 months ago
Hongcheng-Gao / HAVEN
Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".
☆18Updated 3 months ago
Aurora-slz / MM-Verify
☆14Updated 6 months ago
DingchenYang99 / Pensieve
The official repo of our work "Pensieve: Retrospect-then-Compare mitigates Visual Hallucination"
☆16Updated last year
gyhdog99 / RACRO2
Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)
☆17Updated 2 months ago
Mid-Push / SmartCLIP
SmartCLIP: A training method to improve CLIP with both short and long texts
☆19Updated 2 months ago
z-x-yang / DoraemonGPT
Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models
☆86Updated last year
sled-group / moh
[NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models
☆30Updated 9 months ago
OoDBag / VisTA
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
☆19Updated 3 months ago
si0wang / VisVM
☆45Updated 8 months ago
MikeWangWZHL / dymu
☆19Updated 3 months ago
xuanlinli17 / large_vlm_distillation_ood
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)
☆59Updated last year
JaaackHongggg / WorldSense
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
☆28Updated 4 months ago
yixuan730 / DetToolChain
Dettoolchain: A new prompting paradigm to unleash detection ability of MLLM
☆43Updated 10 months ago
zhyang2226 / OPA-DPO
[CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
☆75Updated 3 months ago
zefang-liu / AdaMoLE
AdaMoLE: Adaptive Mixture of LoRA Experts
☆36Updated 10 months ago
Open-Reasoner-Zero / Open-Vision-Reasoner
The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning".
☆137Updated last month
ChenYi99 / EgoPlan
☆71Updated 9 months ago
mm-vl / ULM-R1
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
☆23Updated last month
yaolinli / DeCo
Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models
☆67Updated last month
njucckevin / MM-Self-Improve
A Self-Training Framework for Vision-Language Reasoning
☆82Updated 7 months ago
yliu-cs / SSR
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
☆18Updated 3 months ago
ADaM-BJTU / Mind_with_eyes_Awesome_MLLMs_Reasoning
This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!
☆48Updated 5 months ago
XIAO4579 / Vlm-interpretability
Official implementation for the paper"Towards Understanding How Knowledge Evolves in Large Vision-Language Models"
☆18Updated 4 months ago
DAMO-NLP-SG / CMM
✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
☆48Updated last month
MCG-NJU / p-MoD
[ICCV 2025] p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
☆42Updated 2 months ago
UMass-Embodied-AGI / CoVLM
[ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
☆45Updated 2 months ago
VincentDENGP / 3D-LR
Can 3D Vision-Language Models Truly Understand Natural Language?
☆21Updated last year
yuhui-zh15 / VLMClassifier
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆90Updated 10 months ago
tychen-SJTU / MECD-Benchmark
[NeurIPS'24 spotlight] MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
☆38Updated 2 months ago