pokerme7777 / Compositional-Visual-Reasoning-SurveyLinks
☆76Updated last week
Alternatives and similar repositories for Compositional-Visual-Reasoning-Survey
Users that are interested in Compositional-Visual-Reasoning-Survey are comparing it to the libraries listed below
Sorting:
- Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS☆1,213Updated 5 months ago
- ✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆253Updated 4 months ago
- [NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models☆116Updated last year
- [ICLR 2025] Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models☆58Updated 7 months ago
- Efficient Reasoning Vision Language Models☆370Updated 2 weeks ago
- SDAR (Synergy of Diffusion and AutoRegression), a large diffusion language model(1.7B, 4B, 8B, 30B)☆187Updated this week
- ☆65Updated 6 months ago
- R1-like Computer-use Agent☆84Updated 5 months ago
- Awesome-Efficient-Inference-for-LRMs is a collection of state-of-the-art, novel, exciting, token-efficient methods for Large Reasoning Mo…☆180Updated 2 months ago
- Official Repository of OmniCaptioner☆160Updated 4 months ago
- [Arxiv] Discrete Diffusion in Large Language and Multimodal Models: A Survey☆280Updated 2 weeks ago
- (ICCV 2025) Enhance CLIP and MLLM's fine-grained visual representations with generative models.☆70Updated 2 months ago
- Codebase for Iterative DPO Using Rule-based Rewards☆257Updated 5 months ago
- [NeurIPS2024] Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging☆135Updated 5 months ago
- [ICML 2025] "SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator"☆541Updated last month
- Official repository of "Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models"☆133Updated 3 weeks ago
- [NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation☆107Updated 11 months ago
- Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models☆178Updated 10 months ago
- ✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork☆264Updated last week
- [ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?☆169Updated 4 months ago
- DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models☆142Updated 8 months ago
- [ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++☆189Updated last month
- A collection of multimodal reasoning papers, codes, datasets, benchmarks and resources.☆299Updated last week
- [ICML2025] Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment☆124Updated 2 months ago
- 🔥 🔥 🔥 [NeurIPS 2024] Official Implementation of Hawk: Learning to Understand Open-World Video Anomalies☆218Updated 4 months ago
- A Gaussian dense reward framework for GUI grounding training☆223Updated 2 weeks ago
- Think Beyond Images☆474Updated last week
- (ICCV-2025 Official Code)) Improving Generalist Model with Domain-Specific Experts☆85Updated 2 months ago
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better☆140Updated 2 months ago
- An open-source implementation for training LLaVA-NeXT.☆419Updated 10 months ago