(ArXiv25) Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning
☆59Sep 30, 2025Updated 5 months ago
Alternatives and similar repositories for Vision-Matters
Users that are interested in Vision-Matters are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆23Jan 9, 2026Updated 2 months ago
- [ICCV 2025] Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation.☆52Aug 27, 2025Updated 6 months ago
- (ICML 2024) PyTorch implementation of "Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes"☆16Oct 15, 2024Updated last year
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆35Jun 12, 2025Updated 9 months ago
- Official implementation of MOST: Multiple object localization with self-supervised transformers published at ICCV 2023☆17Mar 20, 2024Updated 2 years ago
- [ACL'25] UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench☆35Aug 12, 2025Updated 7 months ago
- ☆22Aug 8, 2025Updated 7 months ago
- Official Implementation of "VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning".☆63Nov 20, 2025Updated 4 months ago
- [ICML 2025 Spotlight] RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding☆19Mar 2, 2025Updated last year
- 💻 Terminal-Agent with Human-in-the-Loop Learning☆39Jan 16, 2026Updated 2 months ago
- Pytorch implementation of "SKEL-CF: Coarse-to-Fine Biomechanical Skeleton and Surface Mesh Recovery"☆53Mar 17, 2026Updated last week
- [IEEE TII 2025] Official Implementation for "Dual-Detector Reoptimization for Federated Weakly Supervised Video Anomaly Detection via Ada…☆26Nov 11, 2025Updated 4 months ago
- [ICLR 2024] Towards Robust Multi-Modal Reasoning via Model Selection☆15Mar 7, 2024Updated 2 years ago
- GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts☆40Sep 30, 2025Updated 5 months ago
- Source code for the paper "Memory-Efficient Fine-Tuning via Low-Rank Activation Compression"☆14Aug 1, 2025Updated 7 months ago
- ☆20Apr 16, 2025Updated 11 months ago
- [CVPR2025] The implementation of the paper "OODD: Test-time Out-of-Distribution Detection with Dynamic Dictionary".☆18May 9, 2025Updated 10 months ago
- ☆24Sep 12, 2024Updated last year
- ☆17Dec 23, 2025Updated 3 months ago
- The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'☆24May 20, 2025Updated 10 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]☆186Jun 5, 2025Updated 9 months ago
- [MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.☆41Apr 7, 2025Updated 11 months ago
- MMoE: Multimodal Mixture-of-Experts (EMNLP 2024)☆14Nov 14, 2024Updated last year
- ☆13Feb 5, 2022Updated 4 years ago
- Official code for DeepSound-V1☆13May 14, 2025Updated 10 months ago
- ☆10Jul 11, 2022Updated 3 years ago
- code for "CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models"☆19Mar 10, 2025Updated last year
- Code for Heima☆59Apr 21, 2025Updated 11 months ago
- [CVPR 2024] Friendly Sharpness-Aware Minimization☆36Oct 29, 2024Updated last year
- SAEval: A benchmark for sentiment analysis to evaluate the model's performance on various subtasks.☆14Apr 29, 2024Updated last year
- [NeurIPS 2024] Official code for the paper 'RankUp: Boosting Semi-Supervised Regression with an Auxiliary Ranking Classifier'☆14Aug 22, 2025Updated 7 months ago
- ☆41Jul 3, 2025Updated 8 months ago
- [ECCV 2024] Official Implementation of "Disentangling Masked Autoencoders for Unsupervised Domain Generalization"☆14Jul 31, 2024Updated last year
- WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning☆36Jun 10, 2025Updated 9 months ago
- [MM 2023 Oral] Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation☆17Jan 10, 2024Updated 2 years ago
- [IROS 2022] Transporters with Visual Foresight (TVF)☆11Jul 25, 2022Updated 3 years ago
- Official Pytorch implementation of NeuralWalker (ICLR 2025)☆38Jun 25, 2025Updated 8 months ago
- ☆14Feb 26, 2025Updated last year
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆57May 28, 2025Updated 9 months ago