kq-chen / qwen-vl-utilsLinks

helper functions for processing and integrating visual language information with Qwen-VL Series Model

☆15

Alternatives and similar repositories for qwen-vl-utils

Users that are interested in qwen-vl-utils are comparing it to the libraries listed below

Sorting:

SparksJoe / Prism
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆43Updated last year
RhapsodyAILab / MiniCPM-V-Embedding
☆29Updated last year
FudanNLPLAB / MouSi
☆75Updated last year
SkyworkAI / Skywork-MoE
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
☆138Updated last year
locuslab / scaling_laws_data_filtering
☆65Updated last year
mlfoundations / VisIT-Bench
☆50Updated 2 years ago
tianyu-z / VCR
Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.
☆31Updated 8 months ago
zhaochenyang20 / Prompt2Model-Self-Guide
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper
☆33Updated last year
yyht / openrlhf_async_pipline
☆86Updated 3 months ago
MBZUAI-LLM / web2code
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
☆92Updated last year
360CVGroup / 360VL
Our 2nd-gen LMM
☆34Updated last year
microsoft / DELT
DELT: Data Efficacy for Language Model Training
☆42Updated 2 months ago
will-singularity / Skywork-MM
Empirical Study Towards Building An Effective Multi-Modal Large Language Model
☆22Updated 2 years ago
RUCAIBox / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆109Updated 5 months ago
OFA-Sys / TouchStone
Touchstone: Evaluating Vision-Language Models by Language Models
☆83Updated last year
TIGER-AI-Lab / VisualWebInstruct
The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search" [EMNLP25]
☆35Updated 2 months ago
shulin16 / MMInA
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆47Updated 8 months ago
bytedance / MTVQA
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…
☆64Updated 6 months ago
Alpha-VLLM / WeMix-LLM
☆17Updated 2 years ago
neulab / MultiUI
Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding
☆53Updated 11 months ago
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆89Updated last year
efficientscaling / Z1
[EMNLP'2025 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"
☆66Updated 7 months ago
jdf-prog / LLM-Engines
☆50Updated 5 months ago
leezythu / FocusLLM
FocusLLM: Scaling LLM’s Context by Parallel Decoding
☆43Updated 11 months ago
yuyq96 / TextHawk
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆63Updated last year
Infini-AI-Lab / S2FT
☆19Updated 10 months ago
YuchuanTian / RethinkTinyLM
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
☆125Updated 10 months ago
haon-chen / MoCa
☆61Updated 3 months ago
apple / ml-mia-bench
This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
☆34Updated 8 months ago
Zheng0428 / COIG-Kun
☆36Updated last year