Koorye / PCDLinks
Official implemetation of the paper "Policy Contrastive Decoding for Robotic Foundation Models"
☆16Updated 3 months ago
Alternatives and similar repositories for PCD
Users that are interested in PCD are comparing it to the libraries listed below
Sorting:
- Official implemetation of the paper "InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning"☆45Updated 3 weeks ago
- [ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos☆133Updated 4 months ago
- ☆74Updated 9 months ago
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆38Updated 10 months ago
- Preview code of ECCV'24 paper "Distill Gold from Massive Ores" (BiLP)☆25Updated last year
- ☆24Updated last year
- Official code for "Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation"☆72Updated last month
- Official PyTorch Implementation of Learning Affordance Grounding from Exocentric Images, CVPR 2022☆68Updated 10 months ago
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"☆14Updated 2 months ago
- Official implementation of ECCV 2024 paper: Take A Step Back: Rethinking the Two Stages in Visual Reasoning☆14Updated 3 months ago
- InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation☆30Updated last week
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆53Updated 3 months ago
- ☆55Updated 9 months ago
- Egocentric Video Understanding Dataset (EVUD)☆31Updated last year
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs☆59Updated 6 months ago
- [ICML 2025] OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction☆105Updated 5 months ago
- ☆18Updated last year
- [CVPR 2024] Data and benchmark code for the EgoExoLearn dataset☆70Updated last month
- ☆84Updated 2 months ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated last year
- [AAAI 2025] Open-vocabulary Video Instance Segmentation Codebase built upon Detectron2, which is really easy to use.☆24Updated 8 months ago
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆61Updated 6 months ago
- [ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignment☆84Updated 3 months ago
- Awesome paper for multi-modal llm with grounding ability☆19Updated last year
- ☆35Updated 2 months ago
- Unified Vision-Language-Action Model☆193Updated 2 months ago
- Data pre-processing and training code on Open-X-Embodiment with pytorch☆11Updated 8 months ago
- ☆26Updated 5 months ago
- Affordance Grounding from Demonstration Video to Target Image (CVPR 2023)☆44Updated last year
- VisualGPTScore for visio-linguistic reasoning☆27Updated last year