will-singularity / Skywork-MMLinks

Empirical Study Towards Building An Effective Multi-Modal Large Language Model

☆22

Alternatives and similar repositories for Skywork-MM

Users that are interested in Skywork-MM are comparing it to the libraries listed below

Sorting:

gpt4video / GPT4Video
Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation
☆144Updated last year
MBZUAI-LLM / web2code
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
☆97Updated last year
SparksJoe / Prism
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆43Updated last year
Alpha-VLLM / WeMix-LLM
☆17Updated 2 years ago
FudanNLPLAB / MouSi
☆75Updated last year
BytedanceDouyinContent / SAIL-VL2
The SAIL-VL2 series model developed by the BytedanceDouyinContent Group
☆75Updated 2 months ago
sterzhang / image-textualization
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)
☆169Updated last year
scenarios / WeMM
☆87Updated last year
OFA-Sys / TouchStone
Touchstone: Evaluating Vision-Language Models by Language Models
☆83Updated last year
mynameischaos / Lion
Lion: Kindling Vision Intelligence within Large Language Models
☆51Updated last year
jihaonew / MM-Instruct
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Updated last year
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆95Updated last year
huggingface / docmatix
A huge dataset for Document Visual Question Answering
☆20Updated last year
QQ-MM / PureMM
☆21Updated last year
pkunlp-icler / PCA-EVAL
[ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
☆104Updated last year
OpenGVLab / MM-Interleaved
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
☆247Updated last year
thu-ml / zh-clip
☆72Updated 2 years ago
WePOINTS / WePOINTS
☆186Updated 10 months ago
baaivision / CapsFusion
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆212Updated last year
yuyq96 / TextHawk
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆64Updated last year
patrick-tssn / Awesome-Colorful-LLM
Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…
☆124Updated 6 months ago
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
OpenGVLab / MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆117Updated last year
TIGER-AI-Lab / MEGA-Bench
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]
☆77Updated 5 months ago
thunlp / Muffin
☆66Updated last year
OpenGVLab / ControlLLM
ControlLLM: Augment Language Models with Tools by Searching on Graphs
☆194Updated last year
RhapsodyAILab / MiniCPM-V-Embedding
☆29Updated last year
MAmmoTH-VL / MAmmoTH-VL
(ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
☆48Updated 6 months ago
FreedomIntelligence / ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆277Updated last year
mulanai / MuLan
MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)
☆143Updated 10 months ago