☆57Apr 4, 2024Updated 2 years ago
Alternatives and similar repositories for EgoCOT_Dataset
Users that are interested in EgoCOT_Dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"☆29Aug 28, 2023Updated 2 years ago
- Implementation of 'A Neural Compositional Paradigm for Image Captioning' by B. Dai, S.Fidler, D. Lin☆12Mar 15, 2019Updated 7 years ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆107Mar 14, 2024Updated 2 years ago
- [IROS24 Oral]ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models☆102Aug 22, 2024Updated last year
- [NeurIPS 2024] MSR3D: Advanced Situated Reasoning in 3D Scenes☆73Dec 2, 2025Updated 6 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [CoRL2024] Official repo of `A3VLM: Actionable Articulation-Aware Vision Language Model`☆122Oct 7, 2024Updated last year
- ☆282Mar 17, 2024Updated 2 years ago
- Code of the ICCV 2023 paper "March in Chat: Interactive Prompting for Remote Embodied Referring Expression"☆26May 22, 2024Updated 2 years ago
- ☆31Nov 6, 2024Updated last year
- OpenEQA Embodied Question Answering in the Era of Foundation Models☆363Sep 20, 2024Updated last year
- ☆37Dec 13, 2023Updated 2 years ago
- Python library to control GX11(Dexterous Hand) and EX12(Exoskeleton Glove)☆17Aug 30, 2025Updated 9 months ago
- Codebase for ICLR 2023 paper, "SMART: Self-supervised Multi-task pretrAining with contRol Transformers"☆54Jan 26, 2024Updated 2 years ago
- GQA-OOD is a new dataset and benchmark for the evaluation of VQA models in OOD (out of distribution) settings.☆33Mar 1, 2021Updated 5 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Generative Bias for Robust Visual Question Answering ( CVPR 2023 )☆29Jul 4, 2023Updated 2 years ago
- ☆33Sep 22, 2024Updated last year
- Repository for DialFRED.☆45Sep 14, 2023Updated 2 years ago
- ☆21Oct 10, 2023Updated 2 years ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆38Jun 20, 2024Updated last year
- [ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos☆177Oct 1, 2025Updated 8 months ago
- Pytorch implementation for Egoinstructor at CVPR 2024☆28Dec 1, 2024Updated last year
- Prompter for Embodied Instruction Following☆18Nov 30, 2023Updated 2 years ago
- HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction☆41Sep 15, 2025Updated 8 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Sep 19, 2023Updated 2 years ago
- CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning☆30May 23, 2026Updated 3 weeks ago
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision☆46Oct 19, 2025Updated 7 months ago
- [ICRA2023] Grounding Language with Visual Affordances over Unstructured Data☆48Oct 29, 2023Updated 2 years ago
- [ICML 2024] LEO: An Embodied Generalist Agent in 3D World☆485Apr 20, 2025Updated last year
- Code release for the paper "Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control"☆17Apr 9, 2024Updated 2 years ago
- Suite of human-collected datasets and a multi-task continuous control benchmark for open vocabulary visuolinguomotor learning.☆360May 20, 2026Updated 3 weeks ago
- ☆13Nov 1, 2023Updated 2 years ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆281Jun 25, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- RobotVQA is a project that develops a Deep Learning-based Cognitive Vision System to support household robots' perception while they perf…☆18Jul 26, 2024Updated last year
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆45Jun 14, 2024Updated 2 years ago
- ☆16Oct 21, 2024Updated last year
- A PyTorch re-implementation of the RT-1 (Robotics Transformer)☆52Oct 18, 2023Updated 2 years ago
- Implementation of RT1 (Robotic Transformer) in Pytorch☆450Oct 6, 2024Updated last year
- [arXiv 2023] Embodied Task Planning with Large Language Models☆195Aug 22, 2023Updated 2 years ago
- ☆10Oct 7, 2024Updated last year