snumprlab / himaLinks
Official Implementation of HIMA (COLM'25)
☆16Updated last month
Alternatives and similar repositories for hima
Users that are interested in hima are comparing it to the libraries listed below
Sorting:
- A Text2SQL benchmark for evaluation of Large Language Models☆38Updated last week
- ☆14Updated 7 months ago
- [NeurIPS'25] The official code of "PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning"☆24Updated last month
- ☆17Updated 9 months ago
- The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Updated last month
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Updated last year
- TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs☆23Updated last month
- ☆24Updated 2 months ago
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆37Updated 5 months ago
- ☆16Updated 4 months ago
- [CVPR2025] VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding☆18Updated 7 months ago
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆16Updated last week
- Preference Learning for LLaVA☆51Updated 11 months ago
- ☆22Updated 5 months ago
- [ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents☆19Updated 2 weeks ago
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆19Updated 3 months ago
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆18Updated 8 months ago
- Official Repo for SvS: A Self-play with Variational Problem Synthesis strategy for RLVR training☆39Updated 2 months ago
- [NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆47Updated last month
- Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…☆36Updated 8 months ago
- ☆49Updated 5 months ago
- Plancraft is a minecraft environment and agent suite to test planning capabilities in LLMs☆20Updated 3 months ago
- ☆14Updated 10 months ago
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆46Updated last year
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆31Updated 2 months ago
- ☆14Updated 9 months ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆51Updated 5 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆32Updated 11 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆16Updated 7 months ago
- [NeurIPS 2025] Official Implementation for "Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding"☆20Updated 10 months ago