fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆83Updated last month
Related projects ⓘ
Alternatives and complementary repositories for Mini-LLaVA
- This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"☆62Updated this week
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆168Updated last week
- ☆126Updated 5 months ago
- a family of highly capabale yet efficient large multimodal models☆161Updated 2 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆144Updated last week
- This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆88Updated last week
- E5-V: Universal Embeddings with Multimodal Large Language Models☆168Updated 3 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆178Updated last month
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆81Updated 2 months ago
- ☆58Updated 4 months ago
- Object Recognition as Next Token Prediction (CVPR 2024 Highlight)☆160Updated last month
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆133Updated last month
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆131Updated last month
- Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.☆58Updated 2 weeks ago
- Megatron's multi-modal data loader☆130Updated this week
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆151Updated 7 months ago
- LLaVA-HR: High-Resolution Large Language-Vision Assistant☆213Updated 2 months ago
- [EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…☆84Updated 4 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆91Updated last month
- Quick exploration into fine tuning florence 2☆267Updated last month
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆108Updated last week
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆53Updated 4 months ago
- ☆57Updated 7 months ago
- ☆29Updated 3 weeks ago
- LoRA and DoRA from Scratch Implementations☆188Updated 8 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆203Updated 2 weeks ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆134Updated 5 months ago
- ☆170Updated last month
- From scratch implementation of a vision language model in pure PyTorch☆160Updated 6 months ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆95Updated 2 months ago