mybearyZhang / TwoStageReasonLinks
Official implementation of ECCV 2024 paper: Take A Step Back: Rethinking the Two Stages in Visual Reasoning
☆14Updated 3 weeks ago
Alternatives and similar repositories for TwoStageReason
Users that are interested in TwoStageReason are comparing it to the libraries listed below
Sorting:
- Preview code of ECCV'24 paper "Distill Gold from Massive Ores" (BiLP)☆24Updated 11 months ago
- Official implementation of Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement.☆31Updated 9 months ago
- Code for our ICML'24 on multimodal dataset distillation☆37Updated 8 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆85Updated 9 months ago
- TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆51Updated last week
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆29Updated 7 months ago
- ☆24Updated 4 months ago
- Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆29Updated last month
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆61Updated 3 weeks ago
- ☆13Updated 2 months ago
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"☆12Updated 3 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆43Updated last week
- A paper list for spatial reasoning☆94Updated 2 weeks ago
- ☆37Updated 11 months ago
- MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer☆44Updated 9 months ago
- Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)☆29Updated 2 months ago
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆23Updated last month
- A curated list of awesome papers on dataset reduction, including dataset distillation (dataset condensation) and dataset pruning (coreset…☆55Updated 5 months ago
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆29Updated last week
- ☆71Updated 6 months ago
- Accepted by CVPR 2024☆34Updated last year
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆34Updated 7 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆76Updated last year
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆19Updated 9 months ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆31Updated 9 months ago
- [Blog 1] Recording a bug of grpo_trainer in some R1 projects☆20Updated 4 months ago
- ☆16Updated last year
- Official PyTorch Code of ReKV (ICLR'25)☆28Updated 3 months ago
- [ICLR 2024 Poster] SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos☆18Updated 7 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆47Updated last year