Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.
☆65Oct 25, 2024Updated last year
Alternatives and similar repositories for LLaVA-Llama-3
Users that are interested in LLaVA-Llama-3 are comparing it to the libraries listed below
Sorting:
- [ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning☆160Aug 8, 2025Updated 7 months ago
- [ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"☆21Mar 26, 2025Updated 11 months ago
- ☆12Dec 20, 2024Updated last year
- This is the official code for the paper "Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaborati…☆12Aug 13, 2024Updated last year
- 🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)☆848Aug 5, 2025Updated 7 months ago
- How to export PyTorch models with unsupported layers to ONNX and then to Intel OpenVINO☆28Feb 20, 2025Updated last year
- [CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model☆19Apr 16, 2024Updated last year
- Official Repo for FoodieQA paper (EMNLP 2024)☆19Jun 26, 2025Updated 8 months ago
- [ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant☆247Aug 14, 2024Updated last year
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆40Jan 4, 2024Updated 2 years ago
- The official implement of Freeze-Omni.☆15Jul 10, 2025Updated 8 months ago
- Microsoft Phi 2 Streamlit App, deployed on HuggingFace Spaces is based on the Microsoft Phi 2 small language model (SLM) for text generat…☆14May 1, 2024Updated last year
- ☆16Oct 21, 2024Updated last year
- 基于PaddlePaddle以及wechaty框架 建立的宇宙漫游指南机器人☆17Aug 3, 2021Updated 4 years ago
- [BMVC 2024 Oral ✨] Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization☆20Sep 11, 2024Updated last year
- ☆15Updated this week
- PyTorch reimplementation of "LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators" publishsed in ICLR 2019☆18Sep 13, 2021Updated 4 years ago
- ☆25Feb 2, 2025Updated last year
- [WIP@Oct 13] 质衡-基准测试 (Q-Bench in Chinese),包含中文版【底层视觉问答】和【底层视觉描述】数据集,以及中文提示下的图片质量评价。 We will release Q-Bench in more languages in the futu…☆24Jan 7, 2024Updated 2 years ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆174Sep 25, 2024Updated last year
- The efficient tuning method for VLMs☆81Mar 10, 2024Updated 2 years ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆20May 27, 2025Updated 9 months ago
- ☆20Apr 8, 2025Updated 11 months ago
- A Dead Simple and Modularized Multi-Modal Training and Finetune Framework. Compatible to any LLaVA/Flamingo/QwenVL/MiniGemini etc series …☆19Apr 24, 2024Updated last year
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆138May 8, 2025Updated 10 months ago
- 天池 NVIDIA TensorRT Hackathon 2023 —— 生成式AI模型优化赛 初赛第三名方案☆49Aug 16, 2023Updated 2 years ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆281Jun 25, 2024Updated last year
- FreeDA: Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation (CVPR 2024)☆49Aug 28, 2024Updated last year
- This repository lists some awesome public projects about Zero-shot/Few-shot Learning based on CLIP (Contrastive Language-Image Pre-Traini…☆27Nov 28, 2024Updated last year
- Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance☆23Jan 20, 2025Updated last year
- [CVPR'24] Code for Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models☆18Jul 22, 2024Updated last year
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆108Aug 21, 2025Updated 6 months ago
- Chinese CLIP models with SOTA performance.☆60Aug 28, 2023Updated 2 years ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆98Jan 16, 2025Updated last year
- [ICLR 2025] Official PyTorch implementation of "DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation"☆26Jul 11, 2025Updated 7 months ago
- Official Implementation (Pytorch) of "DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Represe…☆27Jun 24, 2024Updated last year
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆67Feb 19, 2025Updated last year
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆77Nov 20, 2025Updated 3 months ago
- ☆29May 6, 2020Updated 5 years ago