thuml/iVideoGPT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/thuml/iVideoGPT)

thuml / iVideoGPT

Official repository for "iVideoGPT: Interactive VideoGPTs are Scalable World Models" (NeurIPS 2024), https://arxiv.org/abs/2405.15223

☆167

Alternatives and similar repositories for iVideoGPT

Users that are interested in iVideoGPT are comparing it to the libraries listed below

Sorting:

thuml / ContextWM
View on GitHub
Code release for "Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning" (NeurIPS 2023), https://ar…
☆70Sep 29, 2024Updated last year
s-tian / vp2
View on GitHub
VP2 Benchmark (A Control-Centric Benchmark for Video Prediction, ICLR 2023)
☆30Mar 3, 2025Updated last year
flow-diffusion / AVDC
View on GitHub
Official repository of Learning to Act from Actionless Videos through Dense Correspondences.
☆248Apr 25, 2024Updated last year
TencentARC / Moto
View on GitHub
[ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
☆164Oct 1, 2025Updated 5 months ago
1x-technologies / 1xgpt
View on GitHub
world modeling challenge for humanoid robots
☆554Nov 8, 2024Updated last year
Robot-VLAs / RoboVLMs
View on GitHub
☆443Nov 29, 2025Updated 3 months ago
thuml / RLVR-World
View on GitHub
Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934
☆223Oct 28, 2025Updated 4 months ago
bytedance / IRASim
View on GitHub
☆142Jul 8, 2025Updated 7 months ago
MohitShridhar / genima
View on GitHub
Official Code Repo for GENIMA
☆77Oct 29, 2025Updated 4 months ago
Large-Trajectory-Model / ATM
View on GitHub
Official codebase for "Any-point Trajectory Modeling for Policy Learning"
☆273Jun 19, 2025Updated 8 months ago
UMass-Embodied-AGI / MultiPLY
View on GitHub
Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
☆133Oct 24, 2024Updated last year
bytedance / GR-1
View on GitHub
Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"
☆300Apr 22, 2024Updated last year
bytedance / GR-MG
View on GitHub
Official implementation of GR-MG
☆93Jan 12, 2025Updated last year
maitrix-org / Pandora
View on GitHub
Pandora: Towards General World Model with Natural Language Actions and Video States
☆532Sep 23, 2024Updated last year
YanjieZe / GNFactor
View on GitHub
[CoRL 2023 Oral] GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields
☆138Dec 28, 2023Updated 2 years ago
taldatech / ddlp
View on GitHub
[TMLR 2024] Official PyTorch Implementation of Deep Dynamic Latent Particles
☆16Feb 8, 2024Updated 2 years ago
EDiRobotics / GR1-Training
View on GitHub
Reimplementation of GR-1, a generalized policy for robotics manipulation.
☆147Sep 4, 2024Updated last year
nickgkan / 3d_diffuser_actor
View on GitHub
Code for the paper "3D Diffuser Actor: Policy Diffusion with 3D Scene Representations"
☆384Aug 17, 2024Updated last year
jayLEE0301 / vq_bet_official
View on GitHub
Official code for "Behavior Generation with Latent Actions" (ICML 2024 Spotlight)
☆197Feb 28, 2024Updated 2 years ago
flow-diffusion / AVDC_experiments
View on GitHub
The official codebase for running the experiments described in the AVDC paper.
☆20Oct 2, 2024Updated last year
hukz18 / Stem-Ob-Code
View on GitHub
Official repo for arxiv paper "Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion I…
☆17Nov 8, 2024Updated last year
yunhaif / fowm
View on GitHub
Finetuning Offline World Models in the Real World
☆65Oct 25, 2023Updated 2 years ago
octo-models / octo
View on GitHub
Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
☆1,552Jul 31, 2024Updated last year
intuitive-robots / mdt_policy
View on GitHub
[RSS 2024] Code for "Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals" for CALVIN experiments with pre…
☆168Oct 16, 2024Updated last year
simpler-env / SimplerEnv
View on GitHub
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Goo…
☆980Dec 20, 2025Updated 2 months ago
MaxSobolMark / PolicyAgnosticRL
View on GitHub
☆87Aug 4, 2025Updated 7 months ago
SudeepDasari / data4robotics
View on GitHub
☆76Oct 18, 2024Updated last year
leor-c / REM
View on GitHub
Improving Token-Based World Models with Parallel Observation Prediction (ICML 2024)
☆14Feb 23, 2026Updated last week
mlpc-ucsd / XTRA
View on GitHub
On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning
☆16Apr 30, 2023Updated 2 years ago
HeegerGao / FLIP
View on GitHub
Code for FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks
☆79Dec 12, 2024Updated last year
myscience / open-genie
View on GitHub
Pytorch implementation of "Genie: Generative Interactive Environments", Bruce et al. (2024).
☆267Aug 21, 2024Updated last year
GuanxingLu / ManiGaussian
View on GitHub
[ECCV 2024] ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
☆268Mar 24, 2025Updated 11 months ago
younggyoseo / MWM
View on GitHub
Masked World Models for Visual Control
☆135Jun 11, 2023Updated 2 years ago
ShuangLI59 / unified_video_action
View on GitHub
Official PyTorch Implementation of Unified Video Action Model (RSS 2025)
☆338Jul 23, 2025Updated 7 months ago
rmrafailov / kitchen
View on GitHub
☆13Mar 7, 2022Updated 3 years ago
HaoyiZhu / SPA
View on GitHub
[ICLR 2025] SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
☆172Jun 19, 2025Updated 8 months ago
aiming-lab / GRAPE
View on GitHub
GRAPE: Guided-Reinforced Vision-Language-Action Preference Optimization
☆159Apr 6, 2025Updated 10 months ago
LatentActionPretraining / LAPA
View on GitHub
[ICLR 2025] LAPA: Latent Action Pretraining from Videos
☆472Jan 22, 2025Updated last year
FrankZheng2022 / PRISE
View on GitHub
Codebase for PRISE: Learning Temporal Action Abstractions as a Sequence Compression Problem
☆24Jul 11, 2024Updated last year