weleen / awesome-agentLinks
Repository about single/multi-agent, robotics, llm/vlm/vla, scientific discovery, etc.
☆17Updated 4 months ago
Alternatives and similar repositories for awesome-agent
Users that are interested in awesome-agent are comparing it to the libraries listed below
Sorting:
- [CVPRW 2024] TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning. Official code for the 3rd place solution of t…☆48Updated 9 months ago
- ☆46Updated 5 months ago
- ☆72Updated 6 months ago
- A Multi-Modal Large Language Model with Retrieval-augmented In-context Learning capacity designed for generalisable and explainable end-t…☆113Updated last year
- Parameter-Efficient Fine-Tuning for Foundation Models☆99Updated 7 months ago
- ☆58Updated last year
- The official Talk2Car dataset repo☆90Updated 2 months ago
- ☆54Updated last year
- [ECCV 2024] The official code for "Dolphins: Multimodal Language Model for Driving“☆84Updated 9 months ago
- [CVPR2024 Highlight] The official repo for paper "Abductive Ego-View Accident Video Understanding for Safe Driving Perception"☆62Updated 7 months ago
- [WACV 2024 LLVM-AD Challenge] UCU Dataset☆16Updated 2 years ago
- ☆10Updated last year
- ☆27Updated last year
- 【IEEE T-IV】A systematic survey of multi-modal and multi-task visual understanding foundation models for driving scenarios☆50Updated last year
- [Communication in Transprotation Reasearch] Official PyTorch Implementation of ''GPT-4 enhanced multimodal grounding for autonomous driv…☆25Updated last year
- Agentic MLLMs☆77Updated 3 weeks ago
- ☆88Updated last year
- [AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.☆215Updated last year
- 从零到一实现了一个多模态大模型,并命名为Reyes(睿视),R:睿,eyes:眼。Reyes的参数量为8B,视觉编码器使用的是InternViT-300M-448px-V2_5,语言模型侧使用的是Qwen2.5-7B-Instruct,Reyes也通过一个两层MLP投影层连…☆26Updated 9 months ago
- [AAAI2025] Language Prompt for Autonomous Driving☆150Updated 2 months ago
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆59Updated 6 months ago
- VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection☆20Updated 5 months ago
- ☆25Updated 2 years ago
- [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆70Updated 9 months ago
- ☆99Updated 10 months ago
- [NeurIPS 2025] Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO☆64Updated 3 weeks ago
- [ECCV 2024] Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving☆94Updated last year
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)☆59Updated last year
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆88Updated last year
- SFT+RL boosts multimodal reasoning☆37Updated 4 months ago