HCPLab-SYSU / Book-of-MLM
《多模态大模型:新一代人工智能技术范式》作者:刘阳,林倞
☆204Updated 5 months ago
Alternatives and similar repositories for Book-of-MLM:
Users that are interested in Book-of-MLM are comparing it to the libraries listed below
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆185Updated 11 months ago
- ☆318Updated 2 months ago
- Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models☆404Updated 3 weeks ago
- 大模型/LLM推理和部署理论与实践☆250Updated last month
- 一些大语言模型和多模态模型的应用,主要包括小模型,Agent,跨模态搜索,OCR、RAG、ChatBot等等☆166Updated last week
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆310Updated 2 weeks ago
- Hugging Vision, Hugging AGI.☆147Updated last month
- 解锁HuggingFace生态的百般用法☆90Updated 4 months ago
- ☆37Updated last month
- llm & rl☆112Updated last week
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆542Updated 2 weeks ago
- LLM大模型(重点)以及搜广推等 AI 算法中手写的面试题,(非 LeetCode),比如 Self-Attention, AUC等,一般比 LeetCode 更考察一个人的综合能力,又更贴近业务和基础知识一点☆250Updated 4 months ago
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆54Updated 8 months ago
- 基于ReAct手搓一个Agent Demo☆128Updated last week
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆161Updated last year
- Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision☆144Updated last week
- Latest Advances on Long Chain-of-Thought Reasoning☆273Updated 3 weeks ago
- Collect every awesome work about r1!☆356Updated this week
- ☆156Updated 8 months ago
- Llama3-Tutorial(XTuner、LMDeploy、OpenCompass)☆507Updated 11 months ago
- ☆64Updated last year
- ☆103Updated last year
- A most Frontend Collection and survey of vision-language model papers, and models GitHub repository☆180Updated last week
- ☆71Updated 7 months ago
- A Survey on Multimodal Retrieval-Augmented Generation☆156Updated 2 weeks ago
- Efficient Multimodal Large Language Models: A Survey☆343Updated last week
- WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆120Updated 5 months ago
- R1-onevision, a visual language model capable of deep CoT reasoning.☆513Updated 3 weeks ago
- 《自然语言处理:大模型理论与实践》配套数据和代码☆62Updated 4 months ago
- ☆196Updated last week