Alpha-VLLM / WeMix-LLM
☆17Updated last year
Alternatives and similar repositories for WeMix-LLM:
Users that are interested in WeMix-LLM are comparing it to the libraries listed below
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆41Updated 8 months ago
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆23Updated last year
- ☆73Updated last year
- ☆36Updated 6 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆76Updated last year
- ☆29Updated 7 months ago
- Converting Mixtral-8x7B to Mixtral-[1~7]x7B☆22Updated last year
- Touchstone: Evaluating Vision-Language Models by Language Models☆82Updated last year
- ☆49Updated last year
- LMM solved catastrophic forgetting, AAAI2025☆39Updated 4 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆72Updated last year
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆35Updated 9 months ago
- Synthetic data generation pipelines for text-rich images.☆49Updated 3 weeks ago
- ☆44Updated 2 weeks ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆14Updated 3 weeks ago
- Open-Pandora: On-the-fly Control Video Generation☆32Updated 3 months ago
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆117Updated 4 months ago
- Our 2nd-gen LMM☆33Updated 10 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆96Updated last month
- ☆45Updated 9 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆138Updated 4 months ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆30Updated 9 months ago
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆38Updated 4 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆45Updated 2 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆79Updated 8 months ago
- Official repository of MMDU dataset☆86Updated 5 months ago
- Unleashing the Power of Cognitive Dynamics on Large Language Models☆60Updated 6 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆33Updated 8 months ago