WatchTower-Liu / VLM-learning
Building a VLM model starts from the basic module.
☆10Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for VLM-learning
- ☆13Updated 3 months ago
- Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖☆18Updated 5 months ago
- ☆55Updated 9 months ago
- Chinese CLIP models with SOTA performance.☆48Updated last year
- Workshop on Foundation Model 1st foundation model challenge Track1 codebase (Open TransMind v1.0)☆18Updated last year
- 个人项目地址,一些大语言模型和多模态模型的应用☆123Updated 2 weeks ago
- Research Code for Multimodal-Cognition Team in Ant Group☆123Updated 4 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆69Updated 2 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆35Updated last month
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆36Updated 2 months ago
- ☆30Updated 6 months ago
- ☆66Updated last year
- 模型 llava-Qwen2-7B-Instruct-Chinese-CLIP 增强中文文字识别能力和表情包内涵识别能力,接近gpt4o、claude-3.5-sonnet的识别水平!☆13Updated 3 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆26Updated last month
- Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)☆68Updated 2 months ago
- ☆156Updated 8 months ago
- ☆68Updated last week
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆116Updated last year
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆51Updated 3 weeks ago
- Precision Search through Multi-Style Inputs☆54Updated 3 months ago
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆58Updated 10 months ago
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆86Updated 6 months ago
- 可以成功Lora微调的Qwen-VL模型☆16Updated last year
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆57Updated 3 weeks ago
- 国内外数据竞赛资讯整理☆18Updated 3 years ago
- 基于baichuan-7b的开源多模态大语言模型☆72Updated 11 months ago
- The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation☆73Updated this week
- ☆77Updated 6 months ago
- 中文原生多层次文生视频测评基准☆17Updated 4 months ago
- 【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling☆98Updated last month