kq-chen / qwen-vl-utilsLinks
helper functions for processing and integrating visual language information with Qwen-VL Series Model
☆14Updated 10 months ago
Alternatives and similar repositories for qwen-vl-utils
Users that are interested in qwen-vl-utils are comparing it to the libraries listed below
Sorting:
- Our 2nd-gen LMM☆33Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆44Updated last year
- ☆29Updated 10 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆85Updated 8 months ago
- Scaling Preference Data Curation via Human-AI Synergy☆69Updated last week
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆22Updated last year
- [ACL 2025] An official pytorch implement of the paper: Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement☆30Updated last month
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆123Updated 7 months ago
- Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.☆31Updated 4 months ago
- ☆36Updated 10 months ago
- ☆17Updated last year
- ☆50Updated 4 months ago
- ☆72Updated last month
- Touchstone: Evaluating Vision-Language Models by Language Models☆83Updated last year
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆87Updated 11 months ago
- This is the code repo for our paper "Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts".☆36Updated 3 months ago
- MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…☆59Updated last month
- ZeroGUI: Automating Online GUI Learning at Zero Human Cost☆69Updated last week
- ☆42Updated last week
- ☆73Updated last year
- ACL 2025: Synthetic data generation pipelines for text-rich images.☆87Updated 4 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆52Updated 7 months ago
- ☆59Updated 3 weeks ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆33Updated last year
- Automatic prompt optimization framework for multi-step agent tasks.☆31Updated 8 months ago
- ☆64Updated last year
- ☆56Updated 3 weeks ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 10 months ago
- ☆85Updated last month
- [ICCV 2025] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆164Updated 3 months ago