Alpha-VLLM / WeMix-LLMLinks
☆17Updated last year
Alternatives and similar repositories for WeMix-LLM
Users that are interested in WeMix-LLM are comparing it to the libraries listed below
Sorting:
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆22Updated last year
- ☆36Updated 10 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆44Updated last year
- Touchstone: Evaluating Vision-Language Models by Language Models☆83Updated last year
- Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…☆123Updated last month
- code for Scaling Laws of RoPE-based Extrapolation☆73Updated last year
- ☆50Updated last year
- Scaling Preference Data Curation via Human-AI Synergy☆69Updated last week
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆123Updated 7 months ago
- ☆73Updated last year
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 6 months ago
- Reformatted Alignment☆113Updated 9 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆85Updated 8 months ago
- ☆69Updated last month
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆50Updated 7 months ago
- LMM solved catastrophic forgetting, AAAI2025☆44Updated 2 months ago
- ☆29Updated 10 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆78Updated last year
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆105Updated last year
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆140Updated 8 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆35Updated last year
- ☆48Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆266Updated last year
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆62Updated 3 months ago
- ACL 2025: Synthetic data generation pipelines for text-rich images.☆87Updated 4 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆37Updated last year
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆44Updated last year
- Official completion of “Training on the Benchmark Is Not All You Need”.☆34Updated 6 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆135Updated last year