Koorye / PCDLinks
Official implemetation of the paper "Policy Contrastive Decoding for Robotic Foundation Models"
☆16Updated 2 months ago
Alternatives and similar repositories for PCD
Users that are interested in PCD are comparing it to the libraries listed below
Sorting:
- Official implemetation of the paper "InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning"☆37Updated last month
- Data pre-processing and training code on Open-X-Embodiment with pytorch☆11Updated 6 months ago
- [ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos☆119Updated 3 months ago
- official repo for AGNOSTOS, a cross-task manipulation benchmark, and X-ICM method, a cross-task in-context manipulation (VLA) method☆35Updated last month
- ☆53Updated 7 months ago
- ☆25Updated last year
- Official PyTorch Implementation of Learning Affordance Grounding from Exocentric Images, CVPR 2022☆64Updated 9 months ago
- Preview code of ECCV'24 paper "Distill Gold from Massive Ores" (BiLP)☆24Updated last year
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆36Updated 8 months ago
- Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"☆123Updated 2 weeks ago
- Official Implementation of Frequency-enhanced Data Augmentation for Vision-and-Language Navigation (NeurIPS2023)☆14Updated last year
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs☆58Updated 5 months ago
- [ICML 2025] OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction☆96Updated 3 months ago
- Official implementation of ECCV 2024 paper: Take A Step Back: Rethinking the Two Stages in Visual Reasoning☆14Updated 2 months ago
- ☆18Updated last year
- official implementation of NeurIPS 2023 paper "FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation"☆33Updated last year
- LLaVA-VLA: A Simple Yet Powerful Vision-Language-Action Model [Actively Maintained🔥]☆122Updated last week
- ICCV2025☆112Updated this week
- [CVPR 2024] Binding Touch to Everything: Learning Unified Multimodal Tactile Representations☆55Updated 6 months ago
- ☆71Updated 8 months ago
- 🦾 A Dual-System VLA with System2 Thinking☆84Updated 3 weeks ago
- [WIP] Code for LangToMo☆16Updated last month
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆61Updated 4 months ago
- Embodied Question Answering (EQA) benchmark and method (ICCV 2025)☆29Updated last month
- GRAPE: Guided-Reinforced Vision-Language-Action Preference Optimization☆135Updated 4 months ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated last year
- Latest Advances on Vison-Language-Action Models.☆89Updated 5 months ago
- [ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignment☆81Updated 2 months ago
- ☆50Updated last year
- Efficiently apply modification functions to RLDS/TFDS datasets.☆21Updated last year