km1994 / AwesomeMultiModelLinks
【AIGC 实战入门笔记 —— AIGC 摩天大楼】分享 大语言模型(LLMs),大模型高效微调(SFT),检索增强生成(RAG),智能体(Agent),PPT自动生成, 角色扮演,文生图(Stable Diffusion) ,图像文字识别(OCR),语音识别(ASR),语音合成(TTS),人像分割(SA),多模态(VLM),Ai 换脸(Face Swapping), 文生视频(VD),图生视频(SVD),Ai 动作迁移,Ai 虚拟试衣,数字人,全模态理解(Omni),Ai音乐生成 干货学习 等 实战与经验。
☆52Updated 8 months ago
Alternatives and similar repositories for AwesomeMultiModel
Users that are interested in AwesomeMultiModel are comparing it to the libraries listed below
Sorting:
- Building a VLM model starts from the basic module.☆18Updated last year
- Chinese CLIP models with SOTA performance.☆60Updated 2 years ago
- 本项目使用LLaVA 1.6多模态模型实现以文搜图和以图搜图功能。☆28Updated last year
- Research Code for Multimodal-Cognition Team in Ant Group☆171Updated 3 months ago
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆52Updated 2 years ago
- ☆18Updated 2 years ago
- Precision Search through Multi-Style Inputs☆73Updated 5 months ago
- Toward Universal Multimodal Embedding☆73Updated 5 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆128Updated last year
- ☆42Updated 11 months ago
- ☆32Updated 3 years ago
- We hope to train VLM to be a beauty master to help you solve the problem of dressing and beauty.☆23Updated 3 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated last year
- 集成了LLM与SDXL的AIGC应用程序☆29Updated 2 years ago
- official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"☆41Updated 6 months ago
- image retrieval systems based on CNN feature distance and triplet loss☆31Updated 4 years ago
- ☆72Updated 2 years ago
- 一些大语言模型和多模态模型的生态,主要包括跨模态搜索、投机解码、QAT量化、多模态量化、ChatBot、OCR☆194Updated 5 months ago
- 使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory☆56Updated last year
- ☆16Updated last year
- Taiyi-Diffusion-XL训练代码☆23Updated last year
- [WWW 2025] Official PyTorch Code for "CTR-Driven Advertising Image Generation with Multimodal Large Language Models"☆60Updated 5 months ago
- official code for paper: Exploring Domain Incremental Video Highlights Detection with the LiveFood Benchmark☆40Updated 2 years ago
- The official implementation of our paper "Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption"☆38Updated 7 months ago
- 补充了一些Visualglm缺少的文件,可以对Visualglm进行训练,实例中是对人脸做了面相的识别☆13Updated 2 years ago
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101Updated last year
- Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖☆47Updated last year
- Our 2nd-gen LMM☆34Updated last year
- 模型 llava-Qwen2-7B-Instruct-Chinese-CLIP 增强中文文字识别能力和表情包内涵识别能力,接近gpt4o、claude-3.5-sonnet的识别水平!☆27Updated last year
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆146Updated 11 months ago