bdaiinstitute / theia
Theia: Distilling Diverse Vision Foundation Models for Robot Learning
β220Updated 5 months ago
Alternatives and similar repositories for theia:
Users that are interested in theia are comparing it to the libraries listed below
- π₯[ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policyβ197Updated last week
- OpenVLA: An open-source vision-language-action model for robotic manipulation.β145Updated last week
- Code for subgoal synthesis via image editingβ130Updated last year
- Unified Video Action Modelβ123Updated last week
- Embodied Chain of Thought: A robotic policy that reason to solve the task.β189Updated 2 weeks ago
- A Vision-Language Model for Spatial Affordance Prediction in Roboticsβ138Updated 3 weeks ago
- Official codebase for "Any-point Trajectory Modeling for Policy Learning"β209Updated 7 months ago
- [ICLR 2025] LAPA: Latent Action Pretraining from Videosβ199Updated 2 months ago
- Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Successβ215Updated 2 weeks ago
- Embodied Reasoning Question Answer (ERQA) Benchmarkβ95Updated 2 weeks ago
- Official repository of Learning to Act from Actionless Videos through Dense Correspondences.β204Updated 11 months ago
- A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulationβ205Updated last month
- DROID Policy Learning and Evaluationβ175Updated 3 months ago
- β162Updated last year
- β46Updated 3 months ago
- β56Updated last week
- F3RM: Feature Fields for Robotic Manipulation. Official repo for the paper "Distilled Feature Fields Enable Few-Shot Language-Guided Maniβ¦β198Updated 11 months ago
- A unified architecture for multimodal multi-task robotic policy learning.β139Updated last year
- β312Updated 2 months ago
- [RSS 2024] Code for "Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals" for CALVIN experiments with preβ¦β115Updated 5 months ago
- Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"β238Updated 11 months ago
- π₯ SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes.β174Updated last week
- β237Updated 7 months ago
- Code for the paper "3D Diffuser Actor: Policy Diffusion with 3D Scene Representations"β290Updated 7 months ago
- [CoRL2024] Official repo of `A3VLM: Actionable Articulation-Aware Vision Language Model`β109Updated 5 months ago
- Reimplementation of GR-1, a generalized policy for robotics manipulation.β124Updated 6 months ago
- [ICLR 2025 Oral] Seer: Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulationβ120Updated this week
- β102Updated last year
- A Benchmark for Evaluating Generalization for Robotic Manipulationβ107Updated 3 weeks ago
- β66Updated 2 weeks ago