TinyLLaVA / TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
β709Updated 3 weeks ago
Alternatives and similar repositories for TinyLLaVA_Factory:
Users that are interested in TinyLLaVA_Factory are comparing it to the libraries listed below
- Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarksβ1,689Updated this week
- A family of lightweight multimodal models.β972Updated last month
- π A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).β554Updated 3 weeks ago
- LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformerβ348Updated this week
- γICLR 2024π₯γ Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignmentβ771Updated 9 months ago
- Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.β578Updated this week
- [ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Languaβ¦β335Updated last week
- Efficient Multimodal Large Language Models: A Surveyβ304Updated 5 months ago
- β756Updated 6 months ago
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imagβ¦β482Updated 8 months ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsβ720Updated 11 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ306Updated 9 months ago
- [CVPR 2024] OneLLM: One Framework to Align All Modalities with Languageβ613Updated 2 months ago
- A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision,β¦β223Updated 3 weeks ago
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β472Updated 5 months ago
- β281Updated 2 weeks ago
- OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Textβ302Updated 2 months ago
- VisionLLM Seriesβ977Updated 2 weeks ago
- A flexible and efficient codebase for training visually-conditioned language models (VLMs)β543Updated 6 months ago
- RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthinessβ273Updated last month
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.β442Updated this week
- LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)β758Updated 5 months ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Eβ¦β378Updated this week
- β588Updated 11 months ago
- β·οΈ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)β908Updated last month
- Strong and Open Vision Language Assistant for Mobile Devicesβ1,106Updated 9 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β324Updated this week
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β814Updated last month
- When do we not need larger vision models?β354Updated last month
- [CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Alloβ¦β302Updated 4 months ago