fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆84Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for Mini-LLaVA
- E5-V: Universal Embeddings with Multimodal Large Language Models☆173Updated 4 months ago
- This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"☆69Updated last week
- a family of highly capabale yet efficient large multimodal models☆166Updated 2 months ago
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆81Updated 2 months ago
- This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆91Updated 2 weeks ago
- ☆30Updated this week
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆169Updated 3 weeks ago
- ☆128Updated 5 months ago
- From scratch implementation of a vision language model in pure PyTorch☆162Updated 6 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆179Updated last month
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆142Updated last week
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆151Updated 7 months ago
- LoRA and DoRA from Scratch Implementations☆188Updated 8 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆229Updated 3 weeks ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆132Updated last month
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆115Updated last week
- Implementation of DoRA☆283Updated 5 months ago
- Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.☆59Updated 3 weeks ago
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆118Updated 3 weeks ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆242Updated this week
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆49Updated 2 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆134Updated 5 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)☆184Updated this week
- My fork os allen AI's OLMo for educational purposes.☆28Updated this week
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆137Updated last week
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆123Updated 6 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆135Updated last month
- Matryoshka Multimodal Models☆82Updated this week
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆87Updated 2 months ago
- ☆58Updated 8 months ago