TinyLLaVA / TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
☆775Updated last month
Alternatives and similar repositories for TinyLLaVA_Factory:
Users that are interested in TinyLLaVA_Factory are comparing it to the libraries listed below
- A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision,…☆278Updated 3 weeks ago
- [ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…☆384Updated 2 months ago
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆2,030Updated this week
- An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.☆492Updated this week
- 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).☆603Updated 3 months ago
- LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer☆369Updated this week
- [CVPR'25] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness☆319Updated 2 weeks ago
- Efficient Multimodal Large Language Models: A Survey☆327Updated 3 weeks ago
- A paper list of some recent works about Token Compress for Vit and VLM☆377Updated 2 weeks ago
- A family of lightweight multimodal models.☆1,006Updated 4 months ago
- ☆339Updated last month
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆486Updated 2 months ago
- [CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allo…☆323Updated 7 months ago
- 【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment☆795Updated 11 months ago
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆415Updated last week
- Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.☆608Updated this week
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for E…☆403Updated 2 weeks ago
- ☆772Updated 8 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents☆308Updated 11 months ago
- Aligning LMMs with Factually Augmented RLHF☆358Updated last year
- [ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.☆1,275Updated last week
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imag…☆506Updated 11 months ago