sanbuphy / computer-vision-referenceLinks

Collected the world's best computer vision labs and lecture materials.

☆14

Alternatives and similar repositories for computer-vision-reference

Users that are interested in computer-vision-reference are comparing it to the libraries listed below

Sorting:

chenllliang / G1
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning
☆77Updated 2 months ago
GAIR-NLP / MAYE
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
☆138Updated 4 months ago
OpenGVLab / V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆54Updated 7 months ago
ekonwang / VisuoThink
[Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics]: VisuoThink: Empowering LVLM Reasoning with Mul…
☆27Updated 2 weeks ago
cokeshao / Awesome-Multimodal-Token-Compression
Survey: https://arxiv.org/pdf/2507.20198
☆69Updated this week
yczhou001 / Awesome-Diffusion-LLM
paper list, tutorial, and nano code snippet for Diffusion Large Language Models.
☆96Updated last month
yunfeixie233 / ViGaL
☆50Updated last month
RUCAIBox / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆105Updated 2 months ago
maomaocun / dLLM-cache
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…
☆132Updated this week
zitian-gao / one-shot-em
One-shot Entropy Minimization
☆175Updated last month
njucckevin / MM-Self-Improve
A Self-Training Framework for Vision-Language Reasoning
☆80Updated 6 months ago
xiaomi-research / colar
[arXiv2505] Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
☆34Updated last week
VisuLogic-Benchmark / VisuLogic-Eval
☆26Updated last month
ML-GSAI / LLaDA-V
☆194Updated this week
xinyan-cxy / MINT-CoT
☆62Updated last week
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"
☆134Updated 2 months ago
NUS-TRAIL / NoisyRollout
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆83Updated 2 months ago
shiqichen17 / VLM_Merging
Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)
☆68Updated 2 months ago
GAIR-NLP / thinking-with-generated-images
Doodling our way to AGI ✏️ 🖼️ 🧠
☆86Updated 2 months ago
SUSTechBruce / LOOK-M
[EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…
☆98Updated 8 months ago
GaryStack / MMR-V
Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?
☆35Updated last month
InternLM / OREAL
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
☆188Updated 4 months ago
ML-GSAI / Diffusion-LLM-Papers
A Collection of Papers on Diffusion Language Models
☆98Updated this week
LeapLabTHU / limit-of-RLVR
repo for paper https://arxiv.org/abs/2504.13837
☆180Updated last month
gyhdog99 / RACRO2
Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)
☆17Updated last month
ModalMinds / MM-PRM
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision
☆24Updated 2 months ago
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆81Updated 5 months ago
MiniMax-AI / One-RL-to-See-Them-All
The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning
☆308Updated 2 months ago
beichenzbc / BoostStep
official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"
☆36Updated 6 months ago
gszfwsb / AutoGnothi
Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"
☆20Updated 5 months ago