alwynpan / uom-comp90024Links
Demo Code for Subject COMP90024
☆12Updated 5 months ago
Alternatives and similar repositories for uom-comp90024
Users that are interested in uom-comp90024 are comparing it to the libraries listed below
Sorting:
- ☆69Updated 10 months ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆91Updated 11 months ago
- Awesome paper for multi-modal llm with grounding ability☆19Updated last year
- Official PyTorch implementation for paper "ProAPO: Progressively Automatic Prompt Optimization for Visual Classification". The paper is a…☆22Updated 4 months ago
- Project Description☆22Updated last year
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆102Updated 11 months ago
- Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)☆48Updated 5 months ago
- 对llava官方代码的一些学习笔记☆29Updated 11 months ago
- [ICML 2024 Spotlight] "Sample-specific Masks for Visual Reprogramming-based Prompting"☆12Updated 9 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆109Updated 2 weeks ago
- Co-Reward: Self-supervised RL for LLM Reasoning via Contrastive Agreement☆29Updated last month
- ☆101Updated 6 months ago
- ☆58Updated 10 months ago
- Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grain…☆92Updated last month
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆84Updated 7 months ago
- [NeurIPS'24 spotlight] MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning☆40Updated 2 months ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆135Updated last year
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".☆158Updated 3 months ago
- [CVPR' 25] Interleaved-Modal Chain-of-Thought☆86Updated last month
- ☆16Updated 10 months ago
- [ICLR 2025] Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination☆16Updated 7 months ago
- [ICCV 2025] VisRL: Intention-Driven Visual Perception via Reinforced Reasoning☆39Updated 3 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆70Updated last year
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆126Updated last year
- [ICLR 2025] "Noisy Test-Time Adaptation in Vision-Language Models"☆12Updated 7 months ago
- [ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality☆39Updated 2 months ago
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models☆49Updated 7 months ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆374Updated 9 months ago
- A hot-pluggable tool for visualizing LLaVA's attention.☆23Updated last year
- The official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"☆47Updated 4 months ago