DLYuanGod / TinyGPT-VLinks
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
β1,290Updated last year
Alternatives and similar repositories for TinyGPT-V
Users that are interested in TinyGPT-V are comparing it to the libraries listed below
Sorting:
- γEMNLP 2024π₯γVideo-LLaVA: Learning United Visual Representation by Alignment Before Projectionβ3,281Updated 6 months ago
- Mixture-of-Experts for Large Vision-Language Modelsβ2,181Updated 6 months ago
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAIβ1,386Updated last year
- An Open-source Toolkit for LLM Developmentβ2,784Updated 5 months ago
- MINT-1T: A one trillion token multimodal interleaved dataset.β817Updated 10 months ago
- Reaching LLaMA2 Performance with 0.1M Dollarsβ983Updated 11 months ago
- β710Updated last year
- A family of lightweight multimodal models.β1,024Updated 7 months ago
- The official implementation of Self-Play Fine-Tuning (SPIN)β1,166Updated last year
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsβ744Updated last year
- β4,088Updated last year
- [ICLR-2025-SLLM Spotlight π₯]MobiLlama : Small Language Model tailored for edge devicesβ648Updated last month
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"β860Updated last month
- Emu Series: Generative Multimodal Models from BAAIβ1,730Updated 8 months ago
- π₯π₯ LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)β838Updated 11 months ago
- Training LLMs with QLoRA + FSDPβ1,487Updated 7 months ago
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.β2,018Updated 10 months ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.β932Updated 3 months ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Modelsβ1,545Updated last year
- YaRN: Efficient Context Window Extension of Large Language Modelsβ1,499Updated last year
- Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)β2,666Updated 10 months ago
- Run Mixtral-8x7B models in Colab or consumer desktopsβ2,312Updated last year
- Strong and Open Vision Language Assistant for Mobile Devicesβ1,235Updated last year
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.β1,305Updated 2 months ago
- ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expertβ¦β1,460Updated 3 months ago
- [CVPR 2024] OneLLM: One Framework to Align All Modalities with Languageβ647Updated 8 months ago
- LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transformβ¦β1,459Updated last year
- Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"β3,288Updated last year
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β1,913Updated 7 months ago
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projectionβ1,570Updated 7 months ago