will-singularity / Skywork-MMLinks
Empirical Study Towards Building An Effective Multi-Modal Large Language Model
☆22Updated last year
Alternatives and similar repositories for Skywork-MM
Users that are interested in Skywork-MM are comparing it to the libraries listed below
Sorting:
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆43Updated 11 months ago
- ☆73Updated last year
- ☆17Updated last year
- Touchstone: Evaluating Vision-Language Models by Language Models☆83Updated last year
- Our 2nd-gen LMM☆33Updated last year
- ☆21Updated last year
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆37Updated 11 months ago
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated last year
- ☆29Updated 9 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆34Updated 11 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆47Updated 5 months ago
- ☆28Updated last year
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆41Updated last year
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 8 months ago
- LMM solved catastrophic forgetting, AAAI2025☆43Updated last month
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆84Updated 7 months ago
- ☆56Updated last week
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 7 months ago
- ☆32Updated 4 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆66Updated last month
- ☆87Updated 11 months ago
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆47Updated 8 months ago
- ☆36Updated 8 months ago
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆138Updated 7 months ago
- A Dead Simple and Modularized Multi-Modal Training and Finetune Framework. Compatible to any LLaVA/Flamingo/QwenVL/MiniGemini etc series …☆19Updated last year
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆20Updated 5 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆43Updated 3 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆103Updated last week
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆15Updated 3 months ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆25Updated 3 weeks ago