Alpha-VLLM / WeMix-LLMLinks
☆17Updated last year
Alternatives and similar repositories for WeMix-LLM
Users that are interested in WeMix-LLM are comparing it to the libraries listed below
Sorting:
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆22Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆43Updated 11 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆47Updated 5 months ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆83Updated last year
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆77Updated last year
- ☆36Updated 8 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆37Updated 11 months ago
- ☆73Updated last year
- ☆51Updated last year
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆46Updated last year
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆25Updated 3 weeks ago
- Open-Pandora: On-the-fly Control Video Generation☆34Updated 6 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆103Updated last week
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year
- ☆29Updated 9 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆44Updated 11 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆73Updated last year
- Converting Mixtral-8x7B to Mixtral-[1~7]x7B☆22Updated last year
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆34Updated 10 months ago
- ☆102Updated last month
- Unleashing the Power of Cognitive Dynamics on Large Language Models☆61Updated 8 months ago
- Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…☆122Updated this week
- ☆46Updated last month
- LMM solved catastrophic forgetting, AAAI2025☆43Updated last month
- Official repo for StableLLAVA☆95Updated last year
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆66Updated last month
- ☆63Updated last year
- Official repository of MMDU dataset☆91Updated 8 months ago
- ☆99Updated last year
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆57Updated last month