[Awesome-Spatial-VLMs] This repository is the official, community-maintained resource for the survey paper: Spatial Intelligence in Vision-Language Models: A Comprehensive Survey;
☆70Mar 19, 2026Updated this week
Alternatives and similar repositories for Awesome-Spatial-VLMs
Users that are interested in Awesome-Spatial-VLMs are comparing it to the libraries listed below
Sorting:
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!☆11May 24, 2023Updated 2 years ago
- Benchmarking Multi-Image Understanding in Vision and Language Models☆12Jul 29, 2024Updated last year
- Spatial Aptitude Training for Multimodal Langauge Models☆25Feb 8, 2026Updated last month
- Official implementation of StochSync: a zero-shot approach for image generation in arbitrary spaces via stochastic diffusion synchronizat…☆21Jun 24, 2025Updated 8 months ago
- TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Dual-Level Scale-Oriented Contrast☆20Mar 3, 2026Updated 2 weeks ago
- ☆35Apr 4, 2024Updated last year
- Code of 3DMIT: 3D MULTI-MODAL INSTRUCTION TUNING FOR SCENE UNDERSTANDING☆32Jul 26, 2024Updated last year
- [NeurIPS 2025] HoliTom: Holistic Token Merging for Fast Video Large Language Models☆73Oct 10, 2025Updated 5 months ago
- Official Implementation of "Semantics-Consistent Feature Search for Self-Supervised Visual Representation Learning" in AAAI2024.☆13Feb 28, 2024Updated 2 years ago
- ☆13Jun 4, 2025Updated 9 months ago
- [ICLR 2026] Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposure Monocular Videos☆28Jan 26, 2026Updated last month
- ICML 2025 Spotlight, PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative AP…☆14Jun 27, 2025Updated 8 months ago
- PyTorch Implementation for InMaP☆11Oct 28, 2023Updated 2 years ago
- ☆16Jun 10, 2025Updated 9 months ago
- Page for the CVPR 2023 Tutorial - Efficient Neural Networks: From Algorithm Design to Practical Mobile Deployments☆12Jun 30, 2023Updated 2 years ago
- Code for NeurIPS 2024 work "MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps"☆17Dec 11, 2024Updated last year
- Official implementation for the paper"Towards Understanding How Knowledge Evolves in Large Vision-Language Models"☆33Apr 10, 2025Updated 11 months ago
- [Arxiv 2025] Official code and datasets of paper: GNNs as Predictors of Agentic Workflow Performances☆21Jan 15, 2026Updated 2 months ago
- Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images☆18Jun 4, 2025Updated 9 months ago
- ☆15Nov 3, 2022Updated 3 years ago
- CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs (CVPR2024)☆17Jun 14, 2024Updated last year
- ☆11May 6, 2025Updated 10 months ago
- Code for "Contrast then Memorize: Semantic Neighbor Retrieval-Enhanced Inductive Multimodal Knowledge Graph Completion", SIGIR 2024.☆13Feb 20, 2025Updated last year
- Official Implementation of DiffCLIP: Differential Attention Meets CLIP☆54Mar 12, 2025Updated last year
- Data for SubTask A☆17Dec 13, 2021Updated 4 years ago
- Official PyTorch implementation of paper “InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction”☆33Jul 28, 2025Updated 7 months ago
- [NeurIPS 2024 Oral] "Bayesian-Guided Label Mapping for Visual Reprogramming"☆12Dec 20, 2024Updated last year
- ☆17Jan 18, 2026Updated 2 months ago
- Implementation of the SuRP algorithm by the authors of the AISTATS 2022 paper "An Information-Theoretic Justification for Model Pruning".…☆14May 4, 2022Updated 3 years ago
- ☆54Jan 17, 2025Updated last year
- ☆14Jun 10, 2019Updated 6 years ago
- [AAAI 2023] Pytorch Implementation for AAAI2023 paper: One-for-All: Proposal Masked Cross-Class Anomaly Detection☆15Oct 31, 2024Updated last year
- [WACV 2024 LLVM-AD Challenge] UCU Dataset☆15Sep 9, 2023Updated 2 years ago
- Official implementation of CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoor Object Detection from Multi-view Images☆20Jun 24, 2024Updated last year
- [TIP'23] 4D LUT: Learnable Context-Aware 4D Lookup Table for Image Enhancement☆27Jan 3, 2024Updated 2 years ago
- Code and Data for "FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation" (ACL25)☆29Oct 26, 2025Updated 4 months ago
- [CIKM2023] GiGaMA: Generalizable Graph Masked Autoencoder via Collaborative Latent Space Reconstruction☆18Aug 31, 2023Updated 2 years ago
- ☆19Apr 10, 2017Updated 8 years ago
- Official implementation of paper "ACON: Optimizing Context Compression for Long-horizon LLM Agents"☆57Oct 14, 2025Updated 5 months ago