yejun688 / CVPR2025_oral_paper_listLinks
😎 A curated list of CVPR 2025 Oral paper. Total 96
☆45Updated last month
Alternatives and similar repositories for CVPR2025_oral_paper_list
Users that are interested in CVPR2025_oral_paper_list are comparing it to the libraries listed below
Sorting:
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"☆245Updated 8 months ago
- Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"☆145Updated last month
- A paper list for spatial reasoning☆136Updated 2 months ago
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆339Updated 2 months ago
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆154Updated 3 months ago
- Vision Manus: Your versatile Visual AI assistant☆258Updated last week
- Embodied Question Answering (EQA) benchmark and method (ICCV 2025)☆34Updated 3 weeks ago
- ☆28Updated 2 months ago
- ⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.☆197Updated last month
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.☆301Updated 3 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆188Updated 4 months ago
- A vue-based project page template for academic papers. (in development) https://junyaohu.github.io/academic-project-page-template-vue☆286Updated last month
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆76Updated last month
- Official repo and evaluation implementation of VSI-Bench☆583Updated last month
- Official code for the CVPR 2025 paper "Navigation World Models".☆374Updated 3 weeks ago
- ☆54Updated 5 months ago
- DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge☆167Updated last week
- [ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs☆67Updated 3 weeks ago
- ☆29Updated last month
- [CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding☆169Updated 4 months ago
- 😎 up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.☆210Updated this week
- ☆106Updated 8 months ago
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"☆503Updated last month
- ☆16Updated 2 months ago
- STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?☆28Updated last month
- ☆87Updated last month
- [CVPR'2022, TPAMI'2024] LAVT: Language-Aware Vision Transformer for Referring Segmentation☆22Updated 7 months ago
- [ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generation☆120Updated last month
- A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.☆352Updated last week
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆253Updated 4 months ago