ShyFoo / NemesisLinks
Official implementation of Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models (ICLR 2024 Spotlight)
☆13Updated 8 months ago
Alternatives and similar repositories for Nemesis
Users that are interested in Nemesis are comparing it to the libraries listed below
Sorting:
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆38Updated 9 months ago
- Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".☆18Updated 3 months ago
- ☆14Updated 6 months ago
- The official repo of our work "Pensieve: Retrospect-then-Compare mitigates Visual Hallucination"☆16Updated last year
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆17Updated 2 months ago
- SmartCLIP: A training method to improve CLIP with both short and long texts☆19Updated 2 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆86Updated last year
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆30Updated 9 months ago
- VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection☆19Updated 3 months ago
- ☆45Updated 8 months ago
- ☆19Updated 3 months ago
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)☆59Updated last year
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆28Updated 4 months ago
- Dettoolchain: A new prompting paradigm to unleash detection ability of MLLM☆43Updated 10 months ago
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆75Updated 3 months ago
- AdaMoLE: Adaptive Mixture of LoRA Experts☆36Updated 10 months ago
- The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning".☆137Updated last month
- ☆71Updated 9 months ago
- Co-Reinforcement Learning for Unified Multimodal Understanding and Generation☆23Updated last month
- Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models☆67Updated last month
- A Self-Training Framework for Vision-Language Reasoning☆82Updated 7 months ago
- SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆18Updated 3 months ago
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆48Updated 5 months ago
- Official implementation for the paper"Towards Understanding How Knowledge Evolves in Large Vision-Language Models"☆18Updated 4 months ago
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆48Updated last month
- [ICCV 2025] p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay☆42Updated 2 months ago
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆45Updated 2 months ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated last year
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆90Updated 10 months ago
- [NeurIPS'24 spotlight] MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning☆38Updated 2 months ago