gicheonkang / clip-rt
π + π¦Ύ CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
β9Updated 2 months ago
Alternatives and similar repositories for clip-rt:
Users that are interested in clip-rt are comparing it to the libraries listed below
- Official PyTorch Implementation for CVPR'23 Paper, "The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training"β19Updated last year
- Code and models of MOCA (Modular Object-Centric Approach) proposed in "Factorizing Perception and Policy for Interactive Instruction Follβ¦β37Updated 6 months ago
- Official Implementation of ReALFRED (ECCV'24)β31Updated 3 months ago
- π PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"β13Updated last year
- Implementation (R2R part) for the paper "Iterative Vision-and-Language Navigation"β13Updated 9 months ago
- Official Implementation of IVLN-CE: Iterative Vision-and-Language Navigation in Continuous Environmentsβ28Updated last year
- Pytorch Code and Data for EnvEdit: Environment Editing for Vision-and-Language Navigation (CVPR 2022)β31Updated 2 years ago
- Official Implementation of CL-ALFRED (ICLR'24)β19Updated 2 months ago
- Code for MM 22 "Target-Driven Structured Transformer Planner for Vision-Language Navigation"β14Updated 2 years ago
- Prompter for Embodied Instruction Followingβ18Updated last year
- Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal traβ¦β88Updated last year
- Official Implementation of CAPEAM (ICCV'23)β11Updated last month
- β43Updated 2 years ago
- NeurIPS 2022 Paper "VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation"β83Updated last year
- Repository for DialFRED.β42Updated last year
- Official implementation of Layout-aware Dreamer for Embodied Referring Expression Grounding (AAAI'23).β16Updated last year
- ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPOβ13Updated this week
- Official codebase for EmbCLIPβ117Updated last year
- Visual Representation Learning with Stochastic Frame Prediction (ICML 2024)β16Updated last month
- Code for NeurIPS 2022 Datasets and Benchmarks paper - EgoTaskQA: Understanding Human Tasks in Egocentric Videos.β30Updated last year
- Code for NeurIPS 2021 paper "Curriculum Learning for Vision-and-Language Navigation"β15Updated 2 years ago
- Official code for the ACL 2021 Findings paper "Yichi Zhang and Joyce Chai. Hierarchical Task Learning from Language Instructions with Uniβ¦β24Updated 3 years ago
- π A Python Package for Seamless Data Distribution in AI Workflowsβ21Updated last year
- Official repository of ICLR 2022 paper FILM: Following Instructions in Language with Modular Methodsβ118Updated last year
- Official implementation of History Aware Multimodal Transformer for Vision-and-Language Navigation (NeurIPS'21).β107Updated last year
- β43Updated 9 months ago
- Codebase for the Airbert paperβ42Updated last year
- Training code of waypoint predictor in Discrete-to-Continuous VLN.β18Updated 9 months ago
- Code of the ICCV 2023 paper "March in Chat: Interactive Prompting for Remote Embodied Referring Expression"β25Updated 7 months ago
- β19Updated 2 years ago