Koorye / PCDLinks
Official implemetation of the paper "Policy Contrastive Decoding for Robotic Foundation Models"
☆16Updated last month
Alternatives and similar repositories for PCD
Users that are interested in PCD are comparing it to the libraries listed below
Sorting:
- ☆49Updated 7 months ago
- Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"☆96Updated last week
- Official Implementation of Frequency-enhanced Data Augmentation for Vision-and-Language Navigation (NeurIPS2023)☆14Updated last year
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs☆56Updated 4 months ago
- ☆70Updated 7 months ago
- ☆25Updated last year
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated last year
- [ICCV 2025] Latent Motion Token as the Bridging Language for Robot Manipulation☆110Updated 2 months ago
- official repo for AGNOSTOS, a cross-task manipulation benchmark, and X-ICM method, a cross-task in-context manipulation (VLA) method☆31Updated 2 weeks ago
- Data pre-processing and training code on Open-X-Embodiment with pytorch☆11Updated 5 months ago
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆35Updated 8 months ago
- [CVPR 2024] Binding Touch to Everything: Learning Unified Multimodal Tactile Representations☆54Updated 5 months ago
- ☆63Updated this week
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆85Updated 10 months ago
- [ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignment☆78Updated last month
- Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"☆99Updated 5 months ago
- 🦾 A Dual-System VLA with System2 Thinking☆66Updated last week
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities☆75Updated 9 months ago
- Unified Vision-Language-Action Model☆128Updated 2 weeks ago
- SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation☆176Updated 2 weeks ago
- ☆49Updated last year
- ☆18Updated last year
- Embodied Question Answering (EQA) benchmark and method (ICCV 2025)☆27Updated 3 weeks ago
- Official implemetation of the paper "InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning"☆34Updated last week
- Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning☆68Updated 2 months ago
- Official Implementation of CAPEAM (ICCV'23)☆13Updated 7 months ago
- Latest Advances on Vison-Language-Action Models.☆83Updated 4 months ago
- Official PyTorch Implementation of Learning Affordance Grounding from Exocentric Images, CVPR 2022☆62Updated 8 months ago
- ICCV2025☆103Updated 2 weeks ago
- Code for paper: Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection☆35Updated 4 months ago