gicheonkang / clip-rt
π + π¦Ύ CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
β11Updated 3 months ago
Alternatives and similar repositories for clip-rt:
Users that are interested in clip-rt are comparing it to the libraries listed below
- Official PyTorch Implementation for CVPR'23 Paper, "The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training"β19Updated last year
- Official Implementation of ReALFRED (ECCV'24)β35Updated 4 months ago
- Implementation (R2R part) for the paper "Iterative Vision-and-Language Navigation"β13Updated 10 months ago
- NeurIPS 2022 Paper "VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation"β86Updated last year
- Code for MM 22 "Target-Driven Structured Transformer Planner for Vision-Language Navigation"β14Updated 2 years ago
- Code and models of MOCA (Modular Object-Centric Approach) proposed in "Factorizing Perception and Policy for Interactive Instruction Follβ¦β37Updated 8 months ago
- Implementation of our ICCV 2023 paper DREAMWALKER: Mental Planning for Continuous Vision-Language Navigationβ19Updated last year
- β44Updated 10 months ago
- Visual Representation Learning with Stochastic Frame Prediction (ICML 2024)β17Updated 2 months ago
- Official Implementation of CL-ALFRED (ICLR'24)β20Updated 3 months ago
- Official Implementation of IVLN-CE: Iterative Vision-and-Language Navigation in Continuous Environmentsβ30Updated last year
- [ICRA2023] Grounding Language with Visual Affordances over Unstructured Dataβ39Updated last year
- Official Code for Neural Systematic Binderβ30Updated last year
- Pytorch Code and Data for EnvEdit: Environment Editing for Vision-and-Language Navigation (CVPR 2022)β31Updated 2 years ago
- PyTorch implementation of RoLD: Robot Latent Diffusion for Multi-Task Policy Modeling (MMM2025 Best Paper)β15Updated 6 months ago
- Instruction Following Agents with Multimodal Transforemrsβ52Updated 2 years ago
- Official codebase for EmbCLIPβ117Updated last year
- Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal traβ¦β90Updated last year
- π A Python Package for Seamless Data Distribution in AI Workflowsβ21Updated last year
- β42Updated 9 months ago
- β43Updated 2 years ago
- Code of the ICCV 2023 paper "March in Chat: Interactive Prompting for Remote Embodied Referring Expression"β25Updated 9 months ago
- Official implementation of Layout-aware Dreamer for Embodied Referring Expression Grounding (AAAI'23).β16Updated last year
- Code for NeurIPS 2022 Datasets and Benchmarks paper - EgoTaskQA: Understanding Human Tasks in Egocentric Videos.β30Updated last year
- π PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"β13Updated 2 years ago
- SNARE Dataset with MATCH and LaGOR modelsβ24Updated 10 months ago
- NSRM: Neuro-Symbolic Robot Manipulationβ13Updated last year
- β29Updated last year
- code for the paper "ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts" (CVPR 2022)β10Updated 2 years ago