Alpha-VLLM / WeMix-LLM
☆17Updated last year
Alternatives and similar repositories for WeMix-LLM:
Users that are interested in WeMix-LLM are comparing it to the libraries listed below
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆22Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆42Updated 10 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆38Updated 10 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆47Updated 4 months ago
- LMM solved catastrophic forgetting, AAAI2025☆41Updated 3 weeks ago
- ☆29Updated 8 months ago
- Our 2nd-gen LMM☆33Updated 11 months ago
- ☆51Updated last year
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆44Updated 10 months ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆82Updated last year
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆24Updated this week
- ☆73Updated last year
- ☆63Updated last year
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- Synthetic data generation pipelines for text-rich images.☆63Updated 2 months ago
- Official repo for StableLLAVA☆95Updated last year
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆119Updated 5 months ago
- ☆43Updated last month
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆56Updated 3 weeks ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 8 months ago
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment☆75Updated 10 months ago
- ☆36Updated 8 months ago
- Official repository of MMDU dataset☆89Updated 7 months ago
- ☆46Updated 2 weeks ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆102Updated last year
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆76Updated last year
- ☆12Updated last year
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆111Updated last month
- Converting Mixtral-8x7B to Mixtral-[1~7]x7B☆22Updated last year