AI-Study-Han / Zero-Qwen-VL
训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。
☆21Updated 2 weeks ago
Related projects: ⓘ
- ☆64Updated 4 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆35Updated last week
- Visual Instruction Tuning for Qwen2 Base Model☆14Updated 2 months ago
- 从头训练一个小参数量的视觉多模态VLM,预计2024年内开源☆23Updated this week
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆64Updated 2 weeks ago
- Music large model based on InternLM2-chat.☆21Updated last month
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆45Updated 4 months ago
- ☆70Updated 6 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆36Updated 2 months ago
- ☆29Updated 3 months ago
- 基于InternLM2大模型的离线具身智能导盲犬☆60Updated 5 months ago
- Search, organize, discover anything!☆44Updated 5 months ago
- ☆46Updated 6 months ago
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆51Updated 5 months ago
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "☆49Updated last month
- ☆53Updated 7 months ago
- Minicpm和MiniCPM-V的项目和教程。包括推理,量化,边端部署,微调,技术报告、应用六个主题☆87Updated last week
- LLM Tokenizer with BPE algorithm☆23Updated 4 months ago
- ☆32Updated 6 months ago
- ☆54Updated 3 weeks ago
- Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …☆18Updated 5 months ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆36Updated 5 months ago
- Fantastic Data Engineering for Large Language Models☆38Updated last month
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆36Updated 8 months ago
- 从0开始,将chatgpt的技术路线跑一遍。☆103Updated 2 weeks ago
- ☆25Updated 4 months ago
- The official code of "RWKV-CLIP: A Robust Vision-Language Representation Learner"☆97Updated 2 months ago
- A Multi-modal RAG Project with Dataset from Honor of Kings, one of the most popular smart phone games in China☆51Updated 3 weeks ago
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆102Updated 3 months ago
- A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, qwen-vl, phi3-v …☆123Updated last week