ShyFoo / NemesisLinks
Official implementation of Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models (ICLR 2024 Spotlight)
☆15Updated last year
Alternatives and similar repositories for Nemesis
Users that are interested in Nemesis are comparing it to the libraries listed below
Sorting:
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆19Updated 7 months ago
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆41Updated last year
- [CVPR 2024] Code for HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation☆76Updated last year
- ☆18Updated 3 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆34Updated last year
- The official repo of our work "Pensieve: Retrospect-then-Compare mitigates Visual Hallucination"☆15Updated last year
- [NeurIPS'24 spotlight] MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning. [TPAMI'25] MECD+☆45Updated 3 months ago
- [NeurIPS 2025] Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO☆78Updated 3 months ago
- [NeurIPS 2025] Official Implementation for "Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding"☆22Updated last year
- ☆46Updated last year
- Can 3D Vision-Language Models Truly Understand Natural Language?☆20Updated last year
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆76Updated last year
- [NeurIPS 2025] The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reason…☆153Updated 4 months ago
- ☆24Updated 8 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆89Updated this week
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆49Updated last week
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆84Updated 3 months ago
- Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models☆77Updated 6 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Updated 6 months ago
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆104Updated last month
- VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection☆21Updated 8 months ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆96Updated last year
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆69Updated last year
- [NeurIPS'25] The official code of "PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning"☆30Updated 3 weeks ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆52Updated last year
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)☆60Updated last year
- The open-source code for the NeurIPS 2025 paper, "Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learn…☆40Updated last month
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆85Updated last year
- [IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning☆79Updated last year
- ☆21Updated last year