Alpha-VLLM / WeMix-LLM
☆17Updated last year
Alternatives and similar repositories for WeMix-LLM:
Users that are interested in WeMix-LLM are comparing it to the libraries listed below
- ☆28Updated 6 months ago
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆23Updated last year
- Touchstone: Evaluating Vision-Language Models by Language Models☆82Updated last year
- LMM solved catastrophic forgetting, AAAI2025☆39Updated 4 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 8 months ago
- Our 2nd-gen LMM☆33Updated 9 months ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- ☆49Updated last year
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆76Updated last year
- ☆73Updated last year
- ☆36Updated 6 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆35Updated 8 months ago
- Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…☆118Updated last week
- Converting Mixtral-8x7B to Mixtral-[1~7]x7B☆22Updated last year
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 6 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆43Updated 8 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆14Updated 2 weeks ago
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆118Updated 3 months ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆64Updated 4 months ago
- Unleashing the Power of Cognitive Dynamics on Large Language Models☆60Updated 5 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆74Updated 4 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆70Updated last year
- Open-Pandora: On-the-fly Control Video Generation☆32Updated 3 months ago
- Synthetic data generation pipelines for text-rich images.☆44Updated last week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆95Updated 2 weeks ago
- Official repo for StableLLAVA☆94Updated last year
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆34Updated 4 months ago
- ☆44Updated 8 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆30Updated 3 months ago
- ☆133Updated last year