km1994 / AwesomeMultiModelLinks
【AIGC 实战入门笔记 —— AIGC 摩天大楼】分享 大语言模型(LLMs),大模型高效微调(SFT),检索增强生成(RAG),智能体(Agent),PPT自动生成, 角色扮演,文生图(Stable Diffusion) ,图像文字识别(OCR),语音识别(ASR),语音合成(TTS),人像分割(SA),多模态(VLM),Ai 换脸(Face Swapping), 文生视频(VD),图生视频(SVD),Ai 动作迁移,Ai 虚拟试衣,数字人,全模态理解(Omni),Ai音乐生成 干货学习 等 实战与经验。
☆25Updated 4 months ago
Alternatives and similar repositories for AwesomeMultiModel
Users that are interested in AwesomeMultiModel are comparing it to the libraries listed below
Sorting:
- Chinese CLIP models with SOTA performance.☆57Updated 2 years ago
- ☆24Updated 3 years ago
- Code for the Video Similarity Challenge.☆80Updated last year
- ☆15Updated 10 months ago
- ☆70Updated 2 years ago
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆27Updated last year
- Research Code for Multimodal-Cognition Team in Ant Group☆164Updated last month
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 11 months ago
- Masked Vision-Language Transformer in Fashion☆35Updated last year
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…☆30Updated 2 months ago
- ☆57Updated last year
- LAVIS - A One-stop Library for Language-Vision Intelligence☆10Updated 2 years ago
- IAE Dataset, produced by Chaoran Cui, Zhen Shen, Jun Yu. A large scale dataset to facilitate multi-task learning for unified image aesthet…☆19Updated 3 years ago
- Toward Universal Multimodal Embedding☆55Updated last month
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated last year
- official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"☆33Updated last month
- official code for paper: Exploring Domain Incremental Video Highlights Detection with the LiveFood Benchmark☆39Updated last year
- Non-local Modeling for Image Quality Assessment☆13Updated last year
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆104Updated last year
- TransVCL: Attention-enhanced Video Copy Localization Network with Flexible Supervision [AAAI2023 Oral]]☆55Updated 2 years ago
- CLIP中文encoder☆22Updated 3 years ago
- Our 2nd-gen LMM☆34Updated last year
- Bling's Object detection tool☆56Updated 2 years ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆27Updated last month
- The top conferences on video retrieval libraries in recent years, synchronized with my blog.☆14Updated 3 years ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆15Updated 9 months ago
- Facebook Image Similarity Challenge 2021☆19Updated 3 years ago
- EssentialMC2 Video Understanding.☆113Updated 2 years ago
- ☆17Updated 2 years ago
- Avatar: An easy-to-use digital portrait PPT presentation video generation system based on Gradio☆20Updated last year