BUAADreamer / MLLM-Finetuning-DemoLinks

使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory

☆44

Alternatives and similar repositories for MLLM-Finetuning-Demo

Users that are interested in MLLM-Finetuning-Demo are comparing it to the libraries listed below

Sorting:

percent4 / multi-modal-image-search
本项目使用LLaVA 1.6多模态模型实现以文搜图和以图搜图功能。
☆23Updated last year
xinyanghuang7 / Basic-Visual-Language-Model
Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖
☆43Updated last year
WatchTower-Liu / VLM-learning
Building a VLM model starts from the basic module.
☆16Updated last year
LDLINGLINGLING / adan_application
一些大语言模型和多模态模型的生态,主要包括跨模态搜索、投机解码、QAT量化、多模态量化、ChatBot、OCR
☆184Updated 3 weeks ago
zhangfaen / finetune-InternVL2
☆28Updated 11 months ago
sunshine-JLU / deepseek-janus-pro-lora
The objective of this project is to demonstrate how to fine-tune deepseek-janus-pro-lora.
☆32Updated last month
matthewchung74 / qwen_2_5_3B_GRPO_medical_thinking
☆48Updated 3 months ago
AI-Study-Han / Zero-Qwen-VL
训练一个对中文支持更好的LLaVA模型，并开源训练代码和数据。
☆64Updated 10 months ago
yujunhuics / Reyes
从零到一实现了一个多模态大模型，并命名为Reyes（睿视），R：睿，eyes：眼。Reyes的参数量为8B，视觉编码器使用的是InternViT-300M-448px-V2_5,语言模型侧使用的是Qwen2.5-7B-Instruct，Reyes也通过一个两层MLP投影层连…
☆21Updated 5 months ago
liujunwen23 / MIRE
WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge
☆122Updated 8 months ago
Alibaba-NLP / VRAG
Repo for "VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforce…
☆280Updated 3 weeks ago
OvJat / DeepSpeedTutorial
DeepSpeed Tutorial
☆99Updated 11 months ago
TongjiFinLab / CFGPT
Chinese Financial Assistant with Large Language Model
☆64Updated 10 months ago
cnzzx / VSA
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
☆125Updated 8 months ago
aliyun / qwen-dianjin
Qwen DianJin: LLMs for the Financial Industry by Alibaba Cloud
☆119Updated last month
xiteng01 / CVPR2023_foundation_model_Track1
Workshop on Foundation Model 1st foundation model challenge Track1 codebase (Open TransMind v1.0)
☆18Updated 2 years ago
owenliang / DeepSeek-Distill-Qwen-For-Child
☆47Updated 4 months ago
NeverMoreLCH / SearchLVLMs
Repository for the NeurIPS 2024 paper "SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up…
☆24Updated 7 months ago
Czi24 / Awesome-MLLM-LLM-Colab
Happy experimenting with MLLM and LLM models!
☆117Updated 9 months ago
StarRing2022 / R1-Nature
最简易的R1结果在小模型上的复现，阐述类O1与DeepSeek R1最重要的本质。Think is all your need。利用实验佐证，对于强推理能力，think思考过程性内容是AGI/ASI的核心。
☆45Updated 5 months ago
YuhangWuAI / tablerag
made RAG pipeline better in table data
☆90Updated 9 months ago
HuggingAGI / HuggingArxiv
☆259Updated 7 months ago
WangRongsheng / Med-R1
Encourage Medical LLM to engage in deep thinking similar to DeepSeek-R1.
☆25Updated 2 months ago
reilxlx / llava-Qwen2-7B-Instruct-Chinese-CLIP
模型 llava-Qwen2-7B-Instruct-Chinese-CLIP 增强中文文字识别能力和表情包内涵识别能力，接近gpt4o、claude-3.5-sonnet的识别水平！
☆23Updated 11 months ago
scchy / XtunerGUI
Xtuner Factory
☆33Updated last year
yaosenJ / CoalQA
使用煤矿历史事故案例，事故处理报告、安全规程规章制度、技术文档、煤矿从业人员入职考试题库等数据，微调internlm2模型实现针对煤矿事故和煤矿安全知识的智能问答。
☆49Updated 6 months ago
jinbo0906 / Awesome-MLLM-Datasets
This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …
☆50Updated 2 months ago
Alibaba-NLP / OmniSearch
Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
☆350Updated 2 months ago
OceanPresentChao / llm-ReAct
从零搭建Agent框架(Build LLM ReAct Agent from scratch)
☆85Updated 8 months ago
ding523 / Curr_REFT
☆64Updated last month