tulerfeng / Awesome-Embodied-Multimodal-LLMs
Latest Advances on Embodied Multimodal LLMs (or Vison-Language-Action Models).
☆53Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for Awesome-Embodied-Multimodal-LLMs
- [CVPR 2024] On the Road to Portability: Compressing End-to-End Motion Planner for Autonomous Driving☆128Updated 7 months ago
- Official PyTorch implementation of CODA-LM(https://arxiv.org/abs/2404.10595)☆68Updated 3 weeks ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆75Updated 2 months ago
- [CVPR2024 Highlight] The official repo for paper "Abductive Ego-View Accident Video Understanding for Safe Driving Perception"☆30Updated last month
- A paper list of some recent works about Token Compress for Vit and VLM☆149Updated this week
- Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference" proposed by Pekin…☆56Updated last month
- Automatically update arXiv papers about SOT & VLT, Multi-modal Learning, LLM and Video Understanding using Github Actions.☆19Updated this week
- Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal …☆27Updated this week
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆148Updated last month
- A RLHF Infrastructure for Vision-Language Models☆111Updated last week
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆26Updated 4 months ago
- ✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆79Updated this week
- ☆295Updated 6 months ago
- [AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.☆160Updated 3 weeks ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆139Updated last month
- A Multi-Modal Large Language Model with Retrieval-augmented In-context Learning capacity designed for generalisable and explainable end-t…☆75Updated last month
- Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving☆72Updated 10 months ago
- ☆23Updated 3 months ago
- [ECCV 2024] The official code for "Dolphins: Multimodal Language Model for Driving“☆49Updated 4 months ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆45Updated 3 weeks ago
- A Survey on Benchmarks of Multimodal Large Language Models☆66Updated last month
- The paper collections for the autoregressive models in vision.☆233Updated this week
- AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segm…☆69Updated last month
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆71Updated 2 weeks ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆99Updated last week
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆144Updated last month
- ☆50Updated 3 months ago
- A systematic survey of multi-modal and multi-task visual understanding foundation models for driving scenarios☆47Updated 5 months ago
- [CVPR2024] This is the official implement of MP5☆84Updated 4 months ago
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆83Updated 11 months ago