mazpie / genrl

[NeurIPS 2024] GenRL: Multimodal-foundation world models enable grounding language and video prompts into embodied domains, by turning them into sequences of latent world model states. Latent state sequences can be decoded using the decoder of the model, allowing visualization of the expected behavior, before training the agent to execute it.
63Updated this week

Alternatives and similar repositories for genrl:

Users that are interested in genrl are comparing it to the libraries listed below