Koorye / PCDLinks
Official implemetation of the paper "Policy Contrastive Decoding for Robotic Foundation Models"
☆17Updated last month
Alternatives and similar repositories for PCD
Users that are interested in PCD are comparing it to the libraries listed below
Sorting:
- Official implemetation of the paper "InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning"☆45Updated 2 months ago
- Official PyTorch Implementation of Learning Affordance Grounding from Exocentric Images, CVPR 2022☆69Updated last year
- ☆24Updated 2 years ago
- Data pre-processing and training code on Open-X-Embodiment with pytorch☆11Updated 10 months ago
- [ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos☆148Updated last month
- Official code for "Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation"☆104Updated 3 months ago
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆39Updated last year
- ☆60Updated 11 months ago
- Official Implementation of CL-ALFRED (ICLR'24)☆28Updated last year
- [CVPR 2024] Binding Touch to Everything: Learning Unified Multimodal Tactile Representations☆71Updated last week
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs☆61Updated 9 months ago
- [IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning☆74Updated 11 months ago
- Official Implementation of CAPEAM (ICCV'23)☆14Updated 11 months ago
- Affordance Grounding from Demonstration Video to Target Image (CVPR 2023)☆44Updated last year
- [ICML 2025] OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction☆110Updated 7 months ago
- Egocentric Video Understanding Dataset (EVUD)☆32Updated last year
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"☆19Updated 4 months ago
- Preview code of ECCV'24 paper "Distill Gold from Massive Ores" (BiLP)☆25Updated last year
- [CVPR 2024] Data and benchmark code for the EgoExoLearn dataset☆74Updated 3 months ago
- ☆104Updated 4 months ago
- [ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignment☆88Updated 5 months ago
- [NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆31Updated last month
- InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation☆54Updated 2 months ago
- [ICLR 2025] This repo is the official implementation of "The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs".☆13Updated 10 months ago
- [WIP] Code for LangToMo☆20Updated 5 months ago
- ☆18Updated last year
- ☆29Updated 3 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆62Updated 5 months ago
- InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy☆286Updated 2 weeks ago
- Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning☆76Updated 6 months ago