[CVPR 2026] Thinking with Programming Vision: Towards a Unified View for Thinking with Images
☆63Jan 23, 2026Updated last month
Alternatives and similar repositories for CodeVision
Users that are interested in CodeVision are comparing it to the libraries listed below
Sorting:
- ☆66Feb 1, 2026Updated last month
- Math-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images☆55Nov 4, 2025Updated 4 months ago
- RACE is a multi-dimensional benchmark for code generation that focuses on Readability, mAintainability, Correctness, and Efficiency.☆12Oct 12, 2024Updated last year
- ☆61Feb 27, 2026Updated 3 weeks ago
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆44Jul 2, 2025Updated 8 months ago
- ☆12Jan 9, 2024Updated 2 years ago
- Official repository for ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use☆29Nov 4, 2025Updated 4 months ago
- [ICLR 2026] "VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use"☆165Updated this week
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated last year
- a unified reinforcement learning toolbox for joint RL on language models and diffusion models☆75Feb 7, 2026Updated last month
- ☆17May 25, 2025Updated 9 months ago
- ☆26Feb 12, 2026Updated last month
- Official repository for Polarity Sampling, CVPR 2022 ORAL☆13Jul 25, 2022Updated 3 years ago
- The code repository for "OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions"☆13Feb 21, 2025Updated last year
- [ICCV2023] PyTorch implementation of ''Spatial-Aware Token for Weakly Supervised Object Localization''.☆23Oct 24, 2023Updated 2 years ago
- [ACL 2024] An easily extensible framework for simultaneous, text-to-text neural machine translation (SimulMT) for LLMs.☆18Apr 21, 2025Updated 10 months ago
- The code repository for "Wings: Learning Multimodal LLMs without Text-only Forgetting" [NeurIPS 2024]☆26Dec 28, 2024Updated last year
- [ICLR 2019] ]Unsupervised Discovery of Parts, Structure, and Dynamics☆46Dec 26, 2022Updated 3 years ago
- Official Implementation of MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models☆12Nov 1, 2025Updated 4 months ago
- [ICLR 2026] SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models☆78Mar 9, 2026Updated last week
- [ICLR'25] Official repository for "AVHBench: A Cross-Modal Hallucination Evaluation for Audio-Visual Large Language Models"☆20Mar 8, 2026Updated last week
- [FCS'24] LVLM Safety paper☆19Jan 4, 2025Updated last year
- Doodling our way to AGI ✏️ 🖼️ 🧠☆122May 29, 2025Updated 9 months ago
- Codes and data for AAAI-24 paper "Advancing Spatial Reasoning in Large Language Models: An In-depth Evaluation and Enhancement Using the …☆14Apr 23, 2024Updated last year
- Code for the paper: "Modular Neural Image Signal Processing". A modular neural ISP with interpretable stages, multi-style rendering, cros…☆33Jan 19, 2026Updated 2 months ago
- ☆13Nov 5, 2024Updated last year
- Unofficial PyTorch implementation of the paper "Representative Color Transform for Image Enhancement" by Kim et al. (2021), ICCV2021☆13May 24, 2023Updated 2 years ago
- Offical respority for Gait Recogniton with Drones: A benchmark (TMM 2023)☆10Feb 2, 2024Updated 2 years ago
- ☆34Jan 9, 2026Updated 2 months ago
- [NeurIPS 2024] Official Code for the Paper OoD-ViT-NAS: Vision Transformer Neural Architecture Search for Out-of-Distribution Generalizat…☆13Dec 3, 2024Updated last year
- Why do deep convolutional networks generalize so poorly to small image transformations?☆11Jun 23, 2019Updated 6 years ago
- ☆74May 22, 2025Updated 9 months ago
- The open-source code for the NeurIPS 2025 paper, "Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learn…☆50Jan 5, 2026Updated 2 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆86Jan 21, 2026Updated last month
- Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions☆26Feb 11, 2026Updated last month
- Code for ACL 2023 paper "Rethinking Multimodal Entity and Relation Extraction from a Translation Point of View"☆25Jan 18, 2026Updated 2 months ago
- CVE-Factory☆65Feb 13, 2026Updated last month
- EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [🔥The Exploration of R1 for General Audio-Vi…☆75May 18, 2025Updated 10 months ago
- Generating Summaries with Controllable Readability Levels (EMNLP 2023)☆15Aug 6, 2025Updated 7 months ago