zhengxuJosh / Awesome-Multimodal-Spatial-ReasoningView external linksLinks
This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).
☆278Feb 10, 2026Updated last week
Alternatives and similar repositories for Awesome-Multimodal-Spatial-Reasoning
Users that are interested in Awesome-Multimodal-Spatial-Reasoning are comparing it to the libraries listed below
Sorting:
- (CVPR Workshop Best Paper Award) Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustn…☆17Nov 4, 2025Updated 3 months ago
- A minimal Python library for live coding visual scenes using desktop windows.☆71Jan 19, 2026Updated 3 weeks ago
- A lightweight ComfyUI custom node pack for Qwen3-ASR, providing simple speech‑to‑text workflows with local model caching and optional tim…☆36Jan 31, 2026Updated 2 weeks ago
- A multi-agent LLM system for detecting and resolving cognitive dissonance.☆276Oct 14, 2025Updated 4 months ago
- SAM4SS: Tailoring SAM and SAM2 for Semantic Segmentation☆11Jul 31, 2024Updated last year
- Code & Weights for “Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation”☆14Dec 6, 2024Updated last year
- (ICCV 2025) OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation☆13Oct 11, 2025Updated 4 months ago
- ☆31Jul 16, 2025Updated 7 months ago
- 🌐 A Roadmap for 3D Scene Understanding in the Wild☆21Dec 19, 2025Updated last month
- A faster assert! for Rust☆48Aug 11, 2025Updated 6 months ago
- Code for the 4th Monocular Depth Estimation Challenge @ CVPR 2025☆17Jan 26, 2025Updated last year
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆121Aug 10, 2025Updated 6 months ago
- The Missing Point in Vision Transformers for Universal Image Segmentation☆57Nov 14, 2025Updated 3 months ago
- ☆15Jun 16, 2025Updated 8 months ago
- A modern, real-time monitoring dashboard built with FastAPI and Svelte. This application demonstrates real-time data streaming using Serv…☆19Mar 31, 2025Updated 10 months ago
- Provision an OpenAI account with GPT model and RBAC role for your user account for keyless access.☆22Nov 24, 2025Updated 2 months ago
- Accelerating Streaming Video Large Language Models via Hierarchical Token Compression☆43Jan 6, 2026Updated last month
- [ACL'25 Oral] Code for the paper "UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban…☆26Jul 15, 2025Updated 7 months ago
- ☆522Jan 28, 2026Updated 2 weeks ago
- ☆62Updated this week
- Lucid Agents Commerce SDK. Bootstrap AI agents in 60 seconds that can pay, sell, and participate in agentic commerce supply chains. Our p…☆162Updated this week
- DSPy module for OpenAI Codex SDK - signature-driven agentic workflows☆151Dec 8, 2025Updated 2 months ago
- Real-time guardrails for Claude Code tool calls.☆61Feb 4, 2026Updated last week
- Official repository for the ECCV20 paper: " From Shadow Segmentation to Shadow Removal"☆16Nov 3, 2020Updated 5 years ago
- This repository contains the code and data for the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents wit…☆54Feb 7, 2026Updated last week
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆54Mar 9, 2025Updated 11 months ago
- Official implementation of ResCLIP: Residual Attention for Training-free Dense Vision-language Inference☆62Oct 27, 2025Updated 3 months ago
- OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.☆633Oct 29, 2025Updated 3 months ago
- TheNZT is a powerful multi-agent finance query processing system designed to process and respond to finance-related queries efficiently. …☆30Feb 3, 2026Updated 2 weeks ago
- Official repository of Vision Test-Time Training☆49Dec 7, 2025Updated 2 months ago
- ☆54Jan 16, 2026Updated last month
- Official Implementation of "Geometrically-Constrained Agent for Spatial Reasoning"☆52Dec 18, 2025Updated last month
- 🔍Model Context Protocol (MCP) server for Apache Airflow API integration. Provides comprehensive tools for managing Airflow clusters incl…☆44Jan 27, 2026Updated 3 weeks ago
- [ArXiv 2025] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models☆131Dec 25, 2025Updated last month
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos☆32May 27, 2025Updated 8 months ago
- [CVPR 2025] The official implementation of "CacheQuant: Comprehensively Accelerated Diffusion Models"☆44Nov 2, 2025Updated 3 months ago
- A paper list for spatial reasoning☆643Jan 19, 2026Updated 3 weeks ago
- ☆18Mar 12, 2025Updated 11 months ago
- Official Implementation of OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation☆38Jul 5, 2025Updated 7 months ago