yejun688 / CVPR2025_oral_paper_listLinks
π A curated list of CVPR 2025 Oral paper. Total 96
β38Updated last week
Alternatives and similar repositories for CVPR2025_oral_paper_list
Users that are interested in CVPR2025_oral_paper_list are comparing it to the libraries listed below
Sorting:
- Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"β96Updated last week
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.β281Updated last month
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"β215Updated 7 months ago
- A paper list for spatial reasoningβ119Updated last month
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, β¦β156Updated 2 months ago
- Embodied Question Answering (EQA) benchmark and method (ICCV 2025)β27Updated 3 weeks ago
- Official code for the CVPR 2025 paper "Navigation World Models".β297Updated last week
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoningβ67Updated last week
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.β133Updated last month
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligenceβ285Updated 3 weeks ago
- βοΈ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.β173Updated last month
- Latest Advances on Embodied Multimodal LLMs (or Vison-Language-Action Models).β116Updated last year
- SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulationβ175Updated 2 weeks ago
- [ICCV 2025] MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulationβ27Updated last week
- [CVPR 2025]Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulationβ149Updated 3 weeks ago
- β44Updated 3 months ago
- The official implementation of The paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs"β53Updated last month
- β21Updated last month
- β148Updated 3 weeks ago
- A most Frontend Collection and survey of vision-language model papers, and models GitHub repositoryβ259Updated last week
- Official repo and evaluation implementation of VSI-Benchβ541Updated 2 weeks ago
- [CVPR2025] FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compressionβ45Updated 4 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentaβ¦β45Updated 3 weeks ago
- WorldVLA: Towards Autoregressive Action World Modelβ248Updated last week
- π up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.β191Updated 2 weeks ago
- Repository for Vision-and-Language Navigation via Causal Learning (Accepted by CVPR 2024)β77Updated last month
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Groundingβ108Updated last month
- HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Modelβ254Updated last month
- [ICLR 2025] SPA: 3D Spatial-Awareness Enables Effective Embodied Representationβ162Updated 3 weeks ago
- β242Updated 3 months ago