SIBench / Awesome-Visual-Spatial-ReasoningLinks
This is a project about visual spatial reasoning.
☆79Updated last month
Alternatives and similar repositories for Awesome-Visual-Spatial-Reasoning
Users that are interested in Awesome-Visual-Spatial-Reasoning are comparing it to the libraries listed below
Sorting:
- [2025CVPR] FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation☆38Updated 2 weeks ago
- EO: Open-source Unified Embodied Foundation Model Series☆272Updated 2 weeks ago
- GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models☆450Updated 2 months ago
- Official code release for paper "Robo-Imagine: A Robotic Video Generation Model, For Autoregressive Long-Term Task Video Generation With …☆28Updated 4 months ago
- [CVPR 2025] Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation☆71Updated 2 months ago
- Embodied Intelligence in Endovascular Robot Navigation -- 血管介入手术机器人具身导航☆18Updated last month
- ☆23Updated last week
- [ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection☆24Updated 5 months ago
- ☆54Updated 4 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 7 months ago
- [ACM MM 2025] SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation. https://arxiv.org/abs/2506.03139☆68Updated 2 weeks ago
- [ICCV 2025] MM-IFEngine: Towards Multimodal Instruction Following☆113Updated 2 months ago
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆92Updated 5 months ago
- [NeurIPS 2025 spotlight] Official implementation for "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving…☆462Updated 2 months ago
- Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences (ICML 2025)☆26Updated 5 months ago
- Survey: https://arxiv.org/pdf/2507.20198☆218Updated last month
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".☆193Updated 5 months ago
- 这个算法用于无人机群避障一个加入机群的无人机,算法分为两种思路:(1)加入者的路径规划主动机动规避编队机群、(2)编队微调避让加入者。目前只做了第一种思路。唯一已知信息是原机群的运动轨迹F(x,y,z,t)|each plane,对于第一种思路:对于补位飞机唯一的输入参数是…☆27Updated 3 months ago
- ☆128Updated 8 months ago
- vue3-elementPlus-admin,vue3-elementPlus-template☆54Updated 2 weeks ago
- TorchHook: A PyTorch hooks manager, providing convenient interfaces to capture feature maps and debug models.☆13Updated last month
- Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"☆78Updated 2 months ago
- A benchmark evaluates LLMs' performance in automating drawing revision tasks.☆56Updated 3 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆98Updated 4 months ago
- [ICCV 2025] Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation.☆46Updated 3 months ago
- [NeurIPS 2025]⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.☆233Updated last month
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆193Updated 6 months ago
- [CVPR 2025] Official implementation of paper "MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders".☆47Updated 5 months ago
- A python script for downloading huggingface datasets and models.☆20Updated 7 months ago
- ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models☆66Updated 6 months ago