bdaiinstitute / theiaLinks

Theia: Distilling Diverse Vision Foundation Models for Robot Learning

☆259

Alternatives and similar repositories for theia

Users that are interested in theia are comparing it to the libraries listed below

Sorting:

allenai / molmoact
Official Repository for MolmoAct
☆258Updated last month
LostXine / LLaRA
[ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
☆225Updated 8 months ago
Max-Fu / otter
[ICML 2025] OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
☆110Updated 7 months ago
sam2act / sam2act
Official Repository for SAM2Act
☆214Updated 3 months ago
Stanford-ILIAD / openvla-mini
OpenVLA: An open-source vision-language-action model for robotic manipulation.
☆299Updated 8 months ago
wentaoyuan / RoboPoint
A Vision-Language Model for Spatial Affordance Prediction in Robotics
☆205Updated 4 months ago
gaoyuezhou / dino_wm
☆314Updated 8 months ago
kvablack / susie
Code for subgoal synthesis via image editing
☆145Updated 2 years ago
ShuangLI59 / unified_video_action
Official PyTorch Implementation of Unified Video Action Model (RSS 2025)
☆296Updated 4 months ago
changhaonan / A3VLM
[CoRL2024] Official repo of `A3VLM: Actionable Articulation-Aware Vision Language Model`
☆121Updated last year
intuitive-robots / mdt_policy
[RSS 2024] Code for "Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals" for CALVIN experiments with pre…
☆158Updated last year
LatentActionPretraining / LAPA
[ICLR 2025] LAPA: Latent Action Pretraining from Videos
☆407Updated 10 months ago
NVlabs / RoboSpatial
☆120Updated last month
NVlabs / vla0
VLA-0: Building State-of-the-Art VLAs with Zero Modification
☆312Updated last week
WEIRDLabUW / unified-world-model
Unfied World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets
☆158Updated last month
InternRobotics / VLAC
VLAC: A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning
☆233Updated 2 months ago
bytedance / GR-1
Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"
☆288Updated last year
Large-Trajectory-Model / ATM
Official codebase for "Any-point Trajectory Modeling for Policy Learning"
☆264Updated 5 months ago
rail-berkeley / crossformer
☆269Updated last year
droid-dataset / droid_policy_learning
DROID Policy Learning and Evaluation
☆253Updated 7 months ago
Dantong88 / LLARVA
☆60Updated 11 months ago
baaivision / UniVLA
Unified Vision-Language-Action Model
☆243Updated last month
embodiedreasoning / ERQA
Embodied Reasoning Question Answer (ERQA) Benchmark
☆243Updated 8 months ago
flow-diffusion / AVDC
Official repository of Learning to Act from Actionless Videos through Dense Correspondences.
☆237Updated last year
mlzxy / arp
Autoregressive Policy for Robot Learning (RA-L 2025)
☆144Updated 8 months ago
rail-berkeley / bridge_data_v2
☆242Updated last year
jayLEE0301 / vq_bet_official
Official code for "Behavior Generation with Latent Actions" (ICML 2024 Spotlight)
☆189Updated last year
f3rm / f3rm
F3RM: Feature Fields for Robotic Manipulation. Official repo for the paper "Distilled Feature Fields Enable Few-Shot Language-Guided Mani…
☆211Updated last year
shikharbahl / vrb
☆127Updated 2 years ago
NVIDIA / GR00T-Dreams
Nvidia GEAR Lab's initiative to solve the robotics data problem using world models
☆390Updated last month