opendatalab / image-downloader
☆27Updated 9 months ago
Alternatives and similar repositories for image-downloader:
Users that are interested in image-downloader are comparing it to the libraries listed below
- Chinese CLIP models with SOTA performance.☆53Updated last year
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆115Updated 3 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 5 months ago
- ☆56Updated last year
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆40Updated 11 months ago
- Our 2nd-gen LMM☆32Updated 9 months ago
- ☆78Updated 9 months ago
- ☆67Updated last year
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆100Updated 9 months ago
- ☆104Updated last year
- 基于baichuan-7b的开源多模态大语言模型☆73Updated last year
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆23Updated last year
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆35Updated 8 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 8 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 5 months ago
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆29Updated 4 months ago
- ☆36Updated 4 months ago
- ☆171Updated 3 weeks ago
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆115Updated 3 months ago
- Precision Search through Multi-Style Inputs☆64Updated 7 months ago
- ☆28Updated 6 months ago
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆132Updated last month
- Building a VLM model starts from the basic module.☆13Updated 10 months ago
- Taiyi-Diffusion-XL训练代码☆21Updated 8 months ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆216Updated this week
- Awesome Colab Projects Collection☆25Updated last year
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆59Updated 4 months ago