km1994 / AwesomeMultiModelLinks
【AIGC 实战入门笔记 —— AIGC 摩天大楼】分享 大语言模型(LLMs),大模型高效微调(SFT),检索增强生成(RAG),智能体(Agent),PPT自动生成, 角色扮演,文生图(Stable Diffusion) ,图像文字识别(OCR),语音识别(ASR),语音合成(TTS),人像分割(SA),多模态(VLM),Ai 换脸(Face Swapping), 文生视频(VD),图生视频(SVD),Ai 动作迁移,Ai 虚拟试衣,数字人,全模态理解(Omni),Ai音乐生成 干货学习 等 实战与经验。
☆55Updated 9 months ago
Alternatives and similar repositories for AwesomeMultiModel
Users that are interested in AwesomeMultiModel are comparing it to the libraries listed below
Sorting:
- 使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory☆56Updated last year
- Precision Search through Multi-Style Inputs☆73Updated 6 months ago
- Taiyi-Diffusion-XL训练代码☆23Updated last year
- Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多 模态大模型🤖☆47Updated last year
- Chinese CLIP models with SOTA performance.☆60Updated 2 years ago
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆52Updated 2 years ago
- ☆16Updated last year
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…☆37Updated 8 months ago
- ☆43Updated last year
- 一些大语言模型和多模态模型的生态,主要包括跨模态搜索、投机解码、QAT量化、多模态量化、ChatBot、OCR☆197Updated last week
- ☆18Updated 2 years ago
- Research Code for Multimodal-Cognition Team in Ant Group☆172Updated 3 months ago
- [WWW 2025] Official PyTorch Code for "CTR-Driven Advertising Image Generation with Multimodal Large Language Models"☆62Updated 6 months ago
- Building a VLM model starts from the basic module.☆18Updated last year
- 2025.01:从零到一实现了一个多模态大模型,并命名为Reyes(睿视),R:睿,eyes:眼。Reyes的参数量为8B,视觉编码器使用的是InternViT-300M-448px-V2_5,语言模型侧使用的是Qwen2.5-7B-Instruct,Reyes也通过一个两…☆30Updated last week
- official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"☆42Updated 7 months ago
- ☆95Updated 11 months ago
- 集成了LLM与SDXL的AIGC应用程序☆29Updated 2 years ago
- The official implementation of our paper "Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption"☆38Updated 8 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆130Updated last year
- We hope to train VLM to be a beauty master to help you solve the problem of dressing and beauty.☆25Updated 4 months ago
- 本项目提供了基于910B的huggingface LLM模型的Tensor Parallel(TP)部署教程,同时也可以作为一份极简的TP学习代码。☆30Updated last month
- 微信公众号:机器感知 | Tracking the Latest Arxiv Papers☆38Updated 8 months ago
- [ICCV2025] A Token-level Text Image Foundation Model for Document Understanding☆130Updated 5 months ago
- Toward Universal Multimodal Embedding☆73Updated 6 months ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆40Updated last year
- EraseAnything, ICML 2025☆38Updated 4 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated last year
- 本项目使用LLaVA 1.6多模态模型实现以文搜图和以图搜图功能。☆28Updated last year
- ☆59Updated 11 months ago