yejun688 / CVPR2025_oral_paper_listLinks
π A curated list of CVPR 2025 Oral paper. Total 96
β31Updated last week
Alternatives and similar repositories for CVPR2025_oral_paper_list
Users that are interested in CVPR2025_oral_paper_list are comparing it to the libraries listed below
Sorting:
- Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"β65Updated this week
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentaβ¦β43Updated last week
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.β117Updated 3 weeks ago
- AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmβ¦β83Updated 6 months ago
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"β211Updated 6 months ago
- The repo of paper `RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation`β126Updated 6 months ago
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilitiesβ74Updated 8 months ago
- The official implementation of The paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs"β53Updated last month
- Embodied Question Answering (EQA) benchmark and methodβ21Updated 3 months ago
- [CVPR'25] Official implementation of paper "MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders".β26Updated 2 weeks ago
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.β270Updated 3 weeks ago
- β40Updated 2 months ago
- [NeurIPS 2024] Official code repository for MSR3D paperβ60Updated last week
- [CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Modelsβ136Updated 9 months ago
- SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulationβ165Updated last month
- β95Updated last month
- Latest Advances on Embodied Multimodal LLMs (or Vison-Language-Action Models).β117Updated 11 months ago
- [ICLR 2025] SPA: 3D Spatial-Awareness Enables Effective Embodied Representationβ157Updated last week
- [ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignmentβ78Updated 3 weeks ago
- Official repo of VLABench, a large scale benchmark designed for fairly evaluating VLA, Embodied Agent, and VLMs.β238Updated 3 weeks ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Modelsβ85Updated 9 months ago
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Groundingβ107Updated last month
- [CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Groundingβ131Updated 2 months ago
- AnyBimanual: Transfering Unimanual Policy for General Bimanual Manipulationβ77Updated 2 months ago
- The official implement of "VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning"β185Updated 3 weeks ago
- [CVPR 2025] Official repository of the paper "Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation"β98Updated last month
- Latent Motion Token as the Bridging Language for Robot Manipulationβ105Updated last month
- HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Modelβ239Updated last week
- Latest Advances on Vison-Language-Action Models.β74Updated 3 months ago
- [CVPR 2025]Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulationβ146Updated this week