tulerfeng / Awesome-Embodied-Multimodal-LLMs
Latest Advances on Embodied Multimodal LLMs (or Vison-Language-Action Models).
☆48Updated 2 months ago
Related projects: ⓘ
- [CVPR 2024] On the Road to Portability: Compressing End-to-End Motion Planner for Autonomous Driving☆121Updated 5 months ago
- Automatically Update Arxiv Papers about SOT & VLT, Multi-modal Learning, LLM and Video Understanding using Github Actions.☆11Updated this week
- Official PyTorch implementation of CODA-LM(https://arxiv.org/abs/2404.10595)☆57Updated 2 months ago
- [CVPR2024 Highlight] The official repo for paper "Abductive Ego-View Accident Video Understanding for Safe Driving Perception"☆22Updated last week
- ✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆65Updated last week
- [Communication in Transprotation Reasearch] Official PyTorch Implementation of ''GPT-4 enhanced multimodal grounding for autonomous driv…☆18Updated 6 months ago
- [CVPR2024] This is the official implement of MP5☆72Updated 2 months ago
- Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving☆70Updated 8 months ago
- Some experiences for new researchers to grow grow up☆33Updated last year
- A Multi-Modal Large Language Model with Retrieval-augmented In-context Learning capacity designed for generalisable and explainable end-t…☆63Updated 2 months ago
- [AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.☆145Updated 9 months ago
- ☆12Updated 3 months ago
- This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆138Updated 5 months ago
- ☆39Updated 3 months ago
- [COLING 2024 Oral] promISe:Releasing the Capabilities of LLMs with Prompt Introspective Search☆17Updated 3 weeks ago
- Official implementation of Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models (ICLR 2024 Spotlight)☆11Updated 6 months ago
- Papers about Hallucination in Multi-Modal Large Language Models (MLLMs)☆43Updated 3 weeks ago
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆45Updated last month
- Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning☆93Updated 2 months ago
- ☆46Updated last month
- Official GitHub repository for the paper "LingoQA: Video Question Answering for Autonomous Driving"☆106Updated 5 months ago
- ☆144Updated 8 months ago
- AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segm…☆55Updated 2 weeks ago
- My personal homepage☆55Updated this week
- A RLHF Infrastructure for Vision-Language Models☆86Updated 3 months ago
- ☆114Updated 2 months ago
- 😎 up-to-date & curated list of awesome LMM hallucinations papers, methods & resources.☆140Updated 5 months ago
- [ECCV 2024] Embodied Understanding of Driving Scenarios☆137Updated 2 weeks ago
- Official repository for the NuScenes-MQA. This paper is accepted by LLVA-AD Workshop at WACV 2024.☆20Updated 9 months ago
- This repository compiles a list of papers related to Video LLM.☆16Updated 2 months ago