Koorye / PCDLinks
Official implemetation of the paper "Policy Contrastive Decoding for Robotic Foundation Models"
☆16Updated this week
Alternatives and similar repositories for PCD
Users that are interested in PCD are comparing it to the libraries listed below
Sorting:
- ☆24Updated 2 years ago
- Official implemetation of the paper "InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning"☆44Updated 2 weeks ago
- Official code for "Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation"☆86Updated last month
- [ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos☆135Updated 2 weeks ago
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆39Updated 11 months ago
- Official PyTorch Implementation of Learning Affordance Grounding from Exocentric Images, CVPR 2022☆67Updated 11 months ago
- Preview code of ECCV'24 paper "Distill Gold from Massive Ores" (BiLP)☆25Updated last year
- Egocentric Video Understanding Dataset (EVUD)☆31Updated last year
- Data pre-processing and training code on Open-X-Embodiment with pytorch☆11Updated 8 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆53Updated 3 months ago
- [IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning☆73Updated 10 months ago
- ☆26Updated 6 months ago
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"☆16Updated 2 months ago
- [NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆25Updated this week
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated last year
- ☆58Updated 10 months ago
- VisualGPTScore for visio-linguistic reasoning☆27Updated 2 years ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆41Updated 6 months ago
- [ICML 2025] OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction☆106Updated 6 months ago
- ICCV2025☆135Updated last month
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs☆60Updated 7 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆86Updated last year
- [AAAI 2025] Open-vocabulary Video Instance Segmentation Codebase built upon Detectron2, which is really easy to use.☆24Updated 9 months ago
- Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation (NeurIPS 2023)☆22Updated 2 years ago
- ☆96Updated 2 months ago
- Code of the ICCV 2023 paper "March in Chat: Interactive Prompting for Remote Embodied Referring Expression"☆26Updated last year
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆24Updated 8 months ago
- [ICML 2024] SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning☆30Updated last year
- InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy☆129Updated this week
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆61Updated 6 months ago