prism-visual-spatial-intelligence / Awesome-Visual-Spatial-ReasoningLinks
This is a project about visual spatial reasoning.
☆53Updated last week
Alternatives and similar repositories for Awesome-Visual-Spatial-Reasoning
Users that are interested in Awesome-Visual-Spatial-Reasoning are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation☆58Updated this week
- ☆23Updated last week
- ☆42Updated 2 months ago
- Embodied Intelligence in Endovascular Robot Navigation -- 血管介入手术机器人具身导航☆11Updated 3 months ago
- [2025CVPR] FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation☆27Updated last month
- Official code release for paper "Robo-Imagine: A Robotic Video Generation Model, For Autoregressive Long-Term Task Video Generation With …☆23Updated last month
- SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation. https://arxiv.org/abs/2506.03139☆62Updated 2 months ago
- OTFS-channel-estimation☆26Updated 2 months ago
- ☆39Updated 5 months ago
- [ICCV 2025] MM-IFEngine: Towards Multimodal Instruction Following☆101Updated 4 months ago
- [ICCV 2025] FonTS: Text Rendering with Typography and Style Controls☆23Updated this week
- ☆104Updated last month
- [ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection☆22Updated 2 months ago
- Official Repository of ACL 2025 paper OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference☆144Updated 6 months ago
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆48Updated 5 months ago
- ☆21Updated 2 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 4 months ago
- A collection of multimodal reasoning papers, codes, datasets, benchmarks and resources.☆292Updated last week
- ☆25Updated 3 weeks ago
- A benchmark evaluates LLMs' performance in automating drawing revision tasks.☆57Updated last week
- Official implementation of MC-LLaVA.☆139Updated last week
- [CVPR' 25] Interleaved-Modal Chain-of-Thought☆80Updated last week
- TorchHook: A PyTorch hooks manager, providing convenient interfaces to capture feature maps and debug models.☆12Updated 3 months ago
- https://arxiv.org/abs/2408.02032☆118Updated 7 months ago
- vue3-elementPlus-admin,vue3-elementPlus-template☆31Updated this week
- A python script for downloading huggingface datasets and models.☆19Updated 4 months ago
- More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models☆48Updated 3 months ago
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆152Updated 5 months ago
- [ACL'25 Main] Official Implementation of HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Languag…