yejun688 / CVPR2025_oral_paper_listLinks
π A curated list of CVPR 2025 Oral paper. Total 96
β24Updated this week
Alternatives and similar repositories for CVPR2025_oral_paper_list
Users that are interested in CVPR2025_oral_paper_list are comparing it to the libraries listed below
Sorting:
- [CVPR'25] Official implementation of paper "MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders".β25Updated 2 months ago
- The repo of paper `RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation`β122Updated 5 months ago
- βοΈ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.β154Updated 2 weeks ago
- [CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Modelsβ133Updated 8 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Modelsβ85Updated 9 months ago
- β157Updated last month
- Official repository for VisionZip (CVPR 2025)β285Updated last week
- β14Updated last month
- Embodied Question Answering (EQA) benchmark and methodβ20Updated 2 months ago
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.β265Updated last week
- OpenHelix: An Open-source Dual-System VLA Model for Robotic Manipulationβ169Updated last week
- [ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generationβ97Updated 2 months ago
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"β198Updated 5 months ago
- AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmβ¦β83Updated 5 months ago
- HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Modelβ222Updated last month
- The official implement of "VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning"β143Updated this week
- Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasksβ120Updated this week
- [ACMMM 2024] Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priorsβ23Updated 7 months ago
- Neurips 2024β34Updated last month
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMsβ53Updated 3 months ago
- Single-file implementation to advance vision-language-action (VLA) models with reinforcement learning.β96Updated last week
- [ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignmentβ78Updated this week
- Official repo of VLABench, a large scale benchmark designed for fairly evaluating VLA, Embodied Agent, and VLMs.β227Updated this week
- Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"β92Updated 3 months ago
- [CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"β72Updated 8 months ago
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.β104Updated last month
- The official codebase for ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation(cvpr 2024)β132Updated 10 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentaβ¦β38Updated last month
- [CVPR2025] FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compressionβ41Updated 3 months ago
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilitiesβ74Updated 7 months ago