☆57Apr 4, 2024Updated last year
Alternatives and similar repositories for EgoCOT_Dataset
Users that are interested in EgoCOT_Dataset are comparing it to the libraries listed below
Sorting:
- Implementation of 'A Neural Compositional Paradigm for Image Captioning' by B. Dai, S.Fidler, D. Lin☆12Mar 15, 2019Updated 6 years ago
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"☆29Aug 28, 2023Updated 2 years ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆106Mar 14, 2024Updated last year
- ☆32Feb 8, 2024Updated 2 years ago
- Generative Bias for Robust Visual Question Answering ( CVPR 2023 )☆28Jul 4, 2023Updated 2 years ago
- Code used by the paper "What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?".☆14Sep 25, 2017Updated 8 years ago
- [CoRL2024] Official repo of `A3VLM: Actionable Articulation-Aware Vision Language Model`☆121Oct 7, 2024Updated last year
- Code of the ICCV 2023 paper "March in Chat: Interactive Prompting for Remote Embodied Referring Expression"☆26May 22, 2024Updated last year
- ☆36Dec 13, 2023Updated 2 years ago
- [NeurIPS 2024] MSR3D: Advanced Situated Reasoning in 3D Scenes☆70Dec 2, 2025Updated 3 months ago
- Python library to control GX11(Dexterous Hand) and EX12(Exoskeleton Glove)☆15Aug 30, 2025Updated 6 months ago
- HiCRISP Full Code, containing VirtualHome, pybullet simulator and Real AGV platform.☆15Apr 8, 2024Updated last year
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆39Jun 20, 2024Updated last year
- [IROS24 Oral]ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models☆102Aug 22, 2024Updated last year
- OpenEQA Embodied Question Answering in the Era of Foundation Models☆341Sep 20, 2024Updated last year
- HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction☆41Sep 15, 2025Updated 5 months ago
- RobotVQA is a project that develops a Deep Learning-based Cognitive Vision System to support household robots' perception while they perf…☆18Jul 26, 2024Updated last year
- ☆16Oct 21, 2024Updated last year
- ☆18May 14, 2024Updated last year
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Sep 19, 2023Updated 2 years ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆45Jun 14, 2024Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆281Jun 25, 2024Updated last year
- ☆264Mar 17, 2024Updated last year
- Prompter for Embodied Instruction Following☆18Nov 30, 2023Updated 2 years ago
- This is AI implementation (not official) of the DreamGym framework from the paper "Scaling Agent Learning via Experience Synthesis" (arXi…☆37Nov 9, 2025Updated 3 months ago
- ☆21Oct 10, 2023Updated 2 years ago
- Code release for the paper "Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control"☆17Apr 9, 2024Updated last year
- [ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world data☆46Oct 15, 2023Updated 2 years ago
- ☆24May 8, 2024Updated last year
- PyTorch code for Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles (DANCE)☆23Nov 29, 2022Updated 3 years ago
- PyTorch implementation of the Hiveformer research paper☆48Jun 27, 2023Updated 2 years ago
- Repository for DialFRED.☆45Sep 14, 2023Updated 2 years ago
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision☆42Oct 19, 2025Updated 4 months ago
- [NeurIPS 2022] Egocentric Video-Language Pretraining☆256May 9, 2024Updated last year
- Cooperative Vision-and-Dialog Navigation☆72Nov 22, 2022Updated 3 years ago
- Task planning over 3D scene graphs☆19Jul 8, 2022Updated 3 years ago
- [ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos☆164Oct 1, 2025Updated 5 months ago
- [ICRA2023] Grounding Language with Visual Affordances over Unstructured Data☆45Oct 29, 2023Updated 2 years ago
- Suite of human-collected datasets and a multi-task continuous control benchmark for open vocabulary visuolinguomotor learning.☆351Feb 20, 2026Updated last week