gicheonkang / clip-rt
π + π¦Ύ CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
β14Updated 4 months ago
Alternatives and similar repositories for clip-rt:
Users that are interested in clip-rt are comparing it to the libraries listed below
- Official PyTorch Implementation for CVPR'23 Paper, "The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training"β20Updated last year
- Official Implementation of ReALFRED (ECCV'24)β37Updated 5 months ago
- NeurIPS 2022 Paper "VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation"β90Updated 2 years ago
- Visual Representation Learning with Stochastic Frame Prediction (ICML 2024)β18Updated 4 months ago
- π A Python Package for Seamless Data Distribution in AI Workflowsβ21Updated last year
- β42Updated 11 months ago
- Code and models of MOCA (Modular Object-Centric Approach) proposed in "Factorizing Perception and Policy for Interactive Instruction Follβ¦β37Updated 9 months ago
- β44Updated 2 years ago
- Official Implementation of CAPEAM (ICCV'23)β12Updated 4 months ago
- β69Updated 3 months ago
- Official Code for Neural Systematic Binderβ32Updated 2 years ago
- Pytorch Code and Data for EnvEdit: Environment Editing for Vision-and-Language Navigation (CVPR 2022)β31Updated 2 years ago
- β45Updated 11 months ago
- Official Implementation of CL-ALFRED (ICLR'24)β21Updated 5 months ago
- β46Updated 3 months ago
- Code for MM 22 "Target-Driven Structured Transformer Planner for Vision-Language Navigation"β15Updated 2 years ago
- Prompter for Embodied Instruction Followingβ18Updated last year
- [ICRA2023] Grounding Language with Visual Affordances over Unstructured Dataβ42Updated last year
- Official Implementation of IVLN-CE: Iterative Vision-and-Language Navigation in Continuous Environmentsβ31Updated last year
- Code for NeurIPS 2022 Datasets and Benchmarks paper - EgoTaskQA: Understanding Human Tasks in Egocentric Videos.β32Updated last year
- Implementation of our ICCV 2023 paper DREAMWALKER: Mental Planning for Continuous Vision-Language Navigationβ19Updated last year
- Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal traβ¦β90Updated last year
- β66Updated 5 months ago
- Instruction Following Agents with Multimodal Transforemrsβ52Updated 2 years ago
- β25Updated last year
- code for the paper "ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts" (CVPR 2022)β10Updated 2 years ago
- Implementation (R2R part) for the paper "Iterative Vision-and-Language Navigation"β14Updated 11 months ago
- Official codebase for EmbCLIPβ120Updated last year
- Official implementation of: Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheelβ19Updated 3 months ago
- β43Updated last year