arielnlee / LLaVA-1.6-ft
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆36Updated last year
Alternatives and similar repositories for LLaVA-1.6-ft:
Users that are interested in LLaVA-1.6-ft are comparing it to the libraries listed below
- LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft☆44Updated 8 months ago
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆89Updated last year
- ☆133Updated last year
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆149Updated 6 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆145Updated 9 months ago
- ☆62Updated last year
- ☆65Updated 8 months ago
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆55Updated 11 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆316Updated 8 months ago
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆128Updated 9 months ago
- [ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"☆76Updated 10 months ago
- InstructionGPT-4☆39Updated last year
- Plotting heatmaps with the self-attention of the [CLS] tokens in the last layer.☆44Updated 2 years ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆157Updated 5 months ago
- [ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant☆234Updated 7 months ago
- Matryoshka Multimodal Models☆98Updated 2 months ago
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆81Updated 11 months ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆119Updated 9 months ago
- Densely Captioned Images (DCI) dataset repository.☆175Updated 9 months ago
- [CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding☆45Updated 8 months ago
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆104Updated last year
- Official repo for StableLLAVA☆95Updated last year
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆79Updated 5 months ago
- A collection of visual instruction tuning datasets.☆76Updated last year
- ☆44Updated 10 months ago
- ☆143Updated 5 months ago
- Training code for CLIP-FlanT5☆26Updated 8 months ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆202Updated last year
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆57Updated last year
- [ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning☆48Updated last month