Yushi-Hu / VisualSketchpadLinks

Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

☆249

Alternatives and similar repositories for VisualSketchpad

Users that are interested in VisualSketchpad are comparing it to the libraries listed below

Sorting:

dongyh20 / Insight-V
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆216Updated last month
DAMO-NLP-SG / multimodal_textbook
[ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"
☆168Updated 4 months ago
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"
☆134Updated 2 months ago
zhaochen0110 / OpenThinkIMG
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
☆287Updated 2 months ago
mbzuai-oryx / LlamaV-o1
[ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs
☆305Updated 2 months ago
TIGER-AI-Lab / Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]
☆223Updated 4 months ago
ritzz-ai / GUI-R1
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
☆156Updated 3 months ago
yihedeng9 / OpenVLThinker
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆104Updated 2 weeks ago
EvolvingLMMs-Lab / multimodal-sae
[ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
☆146Updated 3 weeks ago
Osilly / Awesome-Interleaving-Reasoning
Interleaving Reasoning: Next-Generation Reasoning Systems for AGI
☆105Updated 3 weeks ago
InfiMM / Awesome-Multimodal-LLM-for-Math-STEM
Paper collections of multi-modal LLM for Math/STEM/Code.
☆117Updated last week
para-lost / AutoPresent
Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)
☆116Updated 2 months ago
LengSicong / MMR1
MMR1: Advancing the Frontiers of Multimodal Reasoning
☆162Updated 4 months ago
TIGER-AI-Lab / Pixel-Reasoner
Pixel-Level Reasoning Model trained with RL
☆180Updated last month
MMStar-Benchmark / MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆189Updated 10 months ago
DataArcTech / ChartMoE
[ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
☆86Updated 4 months ago
RUCAIBox / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆105Updated 2 months ago
TideDra / VL-RLHF
A RLHF Infrastructure for Vision-Language Models
☆179Updated 8 months ago
mathllm / MATH-V
[NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.
☆110Updated 2 months ago
LeslieTrue / SFTvsRL
Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
☆289Updated 3 months ago
EvolvingLMMs-Lab / multimodal-search-r1
MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…
☆272Updated last week
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆144Updated 8 months ago
JieyuZ2 / TaskMeAnything
[NeurIPS 2024] A task generation and model evaluation system for multimodal language models.
☆72Updated 8 months ago
njucckevin / MM-Self-Improve
A Self-Training Framework for Vision-Language Reasoning
☆80Updated 6 months ago
NUS-TRAIL / NoisyRollout
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆83Updated 2 months ago
ruixin31 / Spurious_Rewards
☆322Updated last week
UCSC-VLAA / VLAA-Thinking
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
☆129Updated 3 months ago
MMMU-Benchmark / MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for E…
☆470Updated 2 months ago
HZQ950419 / Math-LLaVA
Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
☆90Updated last year
RAGEN-AI / VAGEN
☆197Updated this week