Gabesarch / grounded-rlLinks

☆97

Alternatives and similar repositories for grounded-rl

Users that are interested in grounded-rl are comparing it to the libraries listed below

Sorting:

UMass-Embodied-AGI / Mirage
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
☆177Updated 2 months ago
PzySeere / MetaSpatial
MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …
☆191Updated 5 months ago
yyyybq / Awesome-Spatial-Reasoning
A paper list for spatial reasoning
☆144Updated 4 months ago
OuyangKun10 / SpaceR
SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning
☆88Updated 3 months ago
mll-lab-nu / MindCube
☆94Updated 3 weeks ago
USC-GVL / PhysBench
[ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …
☆72Updated 4 months ago
zhijie-group / R1-Zero-VSI
☆41Updated 4 months ago
ML-GSAI / LLaDA-V
☆254Updated last week
Open-Reasoner-Zero / Open-Vision-Reasoner
The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning".
☆142Updated last month
TIGER-AI-Lab / Pixel-Reasoner
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆238Updated last month
AntResearchNLP / ViLaSR
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
☆73Updated 2 months ago
TencentARC / SEED-Bench-R1
☆90Updated 4 months ago
ChenYi99 / EgoPlan
[IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
☆74Updated 10 months ago
mll-lab-nu / TStar
TStar is a unified temporal search framework for long-form video question answering
☆69Updated last month
egolife-ai / Ego-R1
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆122Updated 2 months ago
InternRobotics / OST-Bench
[NeurIPS 2025] OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
☆64Updated 3 weeks ago
ZJU-REAL / ViewSpatial-Bench
ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models
☆64Updated 4 months ago
linkangheng / Video-UTR
[ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs
☆60Updated 7 months ago
IranQin / MP5
[CVPR2024] This is the official implement of MP5
☆104Updated last year
sled-group / moh
[NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models
☆32Updated 11 months ago
yunlong10 / Awesome-Video-LMM-Post-Training
🔥🔥🔥 Latest Papers, Codes and Datasets on Video-LMM Post-Training
☆129Updated last week
Fr0zenCrane / UniCoT
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
☆160Updated last month
AnjieCheng / SpatialRGPT
[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
☆269Updated 10 months ago
weijiawu / Awesome-Visual-Reinforcement-Learning
📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.
☆307Updated this week
Haochen-Wang409 / ross
[ICLR'25] Reconstructive Visual Instruction Tuning
☆121Updated 6 months ago
penghao-wu / visual_jigsaw
☆53Updated 3 weeks ago
multimodal-reasoning-lab / Bagel-Zebra-CoT
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
☆87Updated 2 months ago
mahtabbigverdi / Aurora-perception
☆31Updated last month
jiayuww / SpatialEval
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
☆53Updated 9 months ago
InternRobotics / MMSI-Bench
[arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
☆54Updated 2 months ago