open-compass / VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆1,856Updated this week
Alternatives and similar repositories for VLMEvalKit:
Users that are interested in VLMEvalKit are comparing it to the libraries listed below
- A Framework of Small-scale Large Multimodal Models☆745Updated 3 weeks ago
- InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions☆2,755Updated 3 weeks ago
- A family of lightweight multimodal models.☆987Updated 3 months ago
- ☆3,423Updated last week
- Next-Token Prediction is All You Need☆2,004Updated 3 months ago
- Mixture-of-Experts for Large Vision-Language Models☆2,082Updated 2 months ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,854Updated 3 months ago
- 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).☆586Updated last month
- ☆765Updated 7 months ago
- 【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment☆784Updated 10 months ago
- 🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.☆1,951Updated 3 weeks ago
- O1 Replication Journey☆1,947Updated last month
- VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs☆1,078Updated 3 weeks ago
- Strong and Open Vision Language Assistant for Mobile Devices☆1,139Updated 10 months ago
- Emu Series: Generative Multimodal Models from BAAI☆1,683Updated 4 months ago
- An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)☆4,246Updated 3 weeks ago
- ☆320Updated last week
- Famous Vision Language Models and Their Architectures☆646Updated last week
- GPT4V-level open-source multi-modal model based on Llama3-8B☆2,262Updated 5 months ago
- VisionLLM Series☆1,002Updated 2 weeks ago
- OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models☆1,651Updated last month
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆2,916Updated last week
- Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.☆592Updated this week
- An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)☆4,895Updated this week
- Recent LLM-based CV and related works. Welcome to comment/contribute!☆853Updated 8 months ago
- ✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis☆461Updated 2 months ago
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆828Updated 2 months ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆722Updated last year