WatchTower-Liu / VLM-learning
Building a VLM model starts from the basic module.
☆10Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for VLM-learning
- Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖☆16Updated 4 months ago
- Chinese CLIP models with SOTA performance.☆48Updated last year
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆36Updated 2 months ago
- ☆11Updated 2 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆122Updated 4 months ago
- ☆55Updated 9 months ago
- Workshop on Foundation Model 1st foundation model challenge Track1 codebase (Open TransMind v1.0)☆18Updated last year
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆26Updated last month
- Multimodal chatbot with computer vision capabilities integrated☆98Updated 5 months ago
- The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation☆65Updated last month
- ☆30Updated 5 months ago
- ☆66Updated last year
- Taiyi-Diffusion-XL训练代码☆21Updated 5 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆32Updated last month
- ☆156Updated 8 months ago
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆55Updated 2 weeks ago
- Precision Search through Multi-Style Inputs☆54Updated 3 months ago
- ☆32Updated 2 years ago
- 个人项目地址,一些大语言模型和多模态模型的应用☆117Updated this week
- ATEC2023——赛道一: 大模型的知识引入Rank7方案分享☆19Updated 2 weeks ago
- ☆27Updated 5 months ago
- ☆77Updated 6 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆68Updated 2 months ago
- 补充了一些Visualglm缺少的文件,可以对Visualglm进行训练,实例中是对人脸做了面相的识别☆12Updated last year
- 可以成功Lora微调的Qwen-VL模型☆16Updated last year
- Large Multimodal Model☆15Updated 7 months ago
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆115Updated last year
- ☆11Updated 6 months ago
- 模型 llava-Qwen2-7B-Instruct-Chinese-CLIP 增强中文文字识别能力和表情包内涵识别能力,接近gpt4o、claude-3.5-sonnet的识别水平!☆13Updated 3 months ago
- 中文CLIP:自定义数据集,可根据文图提取向量,实现文图匹配。☆21Updated 2 years ago