Koorye / PCDLinks
Official implemetation of the paper "Policy Contrastive Decoding for Robotic Foundation Models"
☆16Updated 3 weeks ago
Alternatives and similar repositories for PCD
Users that are interested in PCD are comparing it to the libraries listed below
Sorting:
- Official implemetation of the paper "InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning"☆29Updated 2 weeks ago
- Official Implementation of Frequency-enhanced Data Augmentation for Vision-and-Language Navigation (NeurIPS2023)☆14Updated last year
- Data pre-processing and training code on Open-X-Embodiment with pytorch☆11Updated 5 months ago
- Preview code of ECCV'24 paper "Distill Gold from Massive Ores" (BiLP)☆24Updated 11 months ago
- Official PyTorch Implementation of Learning Affordance Grounding from Exocentric Images, CVPR 2022☆62Updated 7 months ago
- official repo for AGNOSTOS, a cross-task manipulation benchmark, and X-ICM method, a cross-task in-context manipulation (VLA) method☆29Updated last month
- Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"☆65Updated this week
- Latent Motion Token as the Bridging Language for Robot Manipulation☆105Updated last month
- Embodied Question Answering (EQA) benchmark and method (ICCV 2025)☆22Updated this week
- VisualGPTScore for visio-linguistic reasoning☆27Updated last year
- ☆25Updated last year
- ☆46Updated 6 months ago
- [CVPR 2024] Binding Touch to Everything: Learning Unified Multimodal Tactile Representations☆52Updated 4 months ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated last year
- Consistent Prompting for Rehearsal-Free Continual Learning [CVPR2024]☆33Updated 2 weeks ago
- ☆71Updated 6 months ago
- Affordance Grounding from Demonstration Video to Target Image (CVPR 2023)☆44Updated 11 months ago
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"☆12Updated 3 months ago
- [ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignment☆78Updated 3 weeks ago
- [ICML 2024] SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning☆30Updated 8 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆43Updated last week
- ☆18Updated last year
- [CVPR 2025] Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning☆36Updated 2 months ago
- [arXiv] Cross-Modal Adapter for Text-Video Retrieval☆55Updated 2 years ago
- [ICML 2025] OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction☆83Updated 2 months ago
- Official Code For VLA-OS.☆17Updated this week
- [AAAI2023] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (Oral)☆39Updated last year
- Uni-OVSeg is a weakly supervised open-vocabulary segmentation framework that leverages unpaired mask-text pairs.☆52Updated last year
- [CVPR 2024] Offical implemention of the paper "DePT: Decoupled Prompt Tuning"☆104Updated 3 weeks ago
- Official implementation of ECCV 2024 paper: Take A Step Back: Rethinking the Two Stages in Visual Reasoning☆14Updated 3 weeks ago