kq-chen / qwen-vl-utilsLinks
helper functions for processing and integrating visual language information with Qwen-VL Series Model
☆16Updated last year
Alternatives and similar repositories for qwen-vl-utils
Users that are interested in qwen-vl-utils are comparing it to the libraries listed below
Sorting:
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆43Updated last year
- Our 2nd-gen LMM☆34Updated last year
- ☆75Updated last year
- Touchstone: Evaluating Vision-Language Models by Language Models☆83Updated last year
- MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…☆63Updated 6 months ago
- Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.☆31Updated 9 months ago
- Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.☆65Updated last year
- ☆50Updated 2 years ago
- ☆17Updated 2 years ago
- Exploration of the multi modal fuyu-8b model of Adept. 🤓 🔍☆27Updated 2 years ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆97Updated last year
- ☆107Updated 3 weeks ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆137Updated last year
- ☆29Updated last year
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆64Updated last year
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆109Updated 6 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆126Updated 10 months ago
- ☆65Updated last year
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆38Updated last year
- A huge dataset for Document Visual Question Answering☆20Updated last year
- Official code for infimm-hd☆16Updated last year
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆84Updated 10 months ago
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆48Updated last year
- ☆87Updated last year
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Updated 10 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆41Updated last year
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆20Updated 6 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆33Updated last year
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆103Updated 6 months ago
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…☆34Updated 6 months ago