TencentARC / Moto
Latent Motion Token as the Bridging Language for Robot Manipulation
☆48Updated last week
Alternatives and similar repositories for Moto:
Users that are interested in Moto are comparing it to the libraries listed below
- ☆39Updated this week
- ☆64Updated last week
- Official repository for "iVideoGPT: Interactive VideoGPTs are Scalable World Models" (NeurIPS 2024), https://arxiv.org/abs/2405.15223☆83Updated last week
- ☆19Updated last week
- ☆48Updated 3 months ago
- Code for paper "Grounding Video Models to Actions through Goal Conditioned Exploration".☆33Updated last month
- ☆80Updated 4 months ago
- ☆46Updated last week
- Egocentric Video Understanding Dataset (EVUD)☆24Updated 5 months ago
- Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"☆47Updated last month
- Official implementation of GR-MG☆59Updated 2 weeks ago
- AnyBimanual: Transferring Single-arm Policy for General Bimanual Manipulation☆46Updated this week
- Repository for "General Flow as Foundation Affordance for Scalable Robot Learning"☆42Updated 8 months ago
- The repo of paper `RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation`☆69Updated last week
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities☆62Updated 2 months ago
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆83Updated last month
- [NeurIPS 2024] Official code repository for MSR3D paper☆28Updated last month
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World☆122Updated last month
- LAPA: Latent Action Pretraining from Videos☆100Updated 3 weeks ago
- Affordance Grounding from Demonstration Video to Target Image (CVPR 2023)☆41Updated 4 months ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆59Updated 2 months ago
- ☆43Updated 8 months ago
- [ICLR 2023] SQA3D for embodied scene understanding and reasoning☆121Updated last year
- [ECCV2024, Oral, Best Paper Finalist]This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation …☆35Updated last month
- The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆24Updated this week
- A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation☆83Updated 2 weeks ago
- Official repository of Learning to Act from Actionless Videos through Dense Correspondences.☆183Updated 7 months ago
- [RSS 2024] Learning Manipulation by Predicting Interaction☆93Updated 4 months ago
- Code&Data for Grounded 3D-LLM with Referent Tokens☆93Updated 2 months ago
- Code release for "Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning" (NeurIPS 2023), https://ar…☆55Updated 2 months ago