☆401Dec 12, 2024Updated last year
Alternatives and similar repositories for llava-phi
Users that are interested in llava-phi are comparing it to the libraries listed below
Sorting:
- a family of highly capabale yet efficient large multimodal models☆192Aug 23, 2024Updated last year
- A Framework of Small-scale Large Multimodal Models☆963Feb 7, 2026Updated 3 weeks ago
- A family of lightweight multimodal models.☆1,051Nov 18, 2024Updated last year
- Strong and Open Vision Language Assistant for Mobile Devices☆1,334Apr 15, 2024Updated last year
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization☆583Jun 7, 2024Updated last year
- 🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)☆848Aug 5, 2025Updated 6 months ago
- Official Repo For OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]☆1,342Oct 15, 2025Updated 4 months ago
- An open-source implementation for training LLaVA-NeXT.☆432Oct 23, 2024Updated last year
- 【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models☆2,303Jul 15, 2025Updated 7 months ago
- [CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale☆1,170Oct 21, 2024Updated last year
- TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones☆1,307Feb 5, 2026Updated 3 weeks ago
- [ECCV 2024] The official code of paper "Open-Vocabulary SAM".☆1,029Aug 4, 2025Updated 7 months ago
- ☆124Jul 29, 2024Updated last year
- ☆4,577Sep 14, 2025Updated 5 months ago
- (AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions☆260Apr 14, 2024Updated last year
- One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks☆3,750Updated this week
- A flexible and efficient codebase for training visually-conditioned language models (VLMs)☆931Jul 4, 2024Updated last year
- TxBKG - Knowledge Graph Generation for Any PDFs☆188Nov 22, 2024Updated last year
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆763Feb 1, 2024Updated 2 years ago
- ☆247Nov 24, 2024Updated last year
- Welcome to the 'Open-Alteryx-Macro' project. This project is aimed at providing an open-source solution for managing and updating Alteryx…☆156May 25, 2024Updated last year
- ☆288Jul 6, 2024Updated last year
- [ICCV 2025] SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree☆549Jul 29, 2025Updated 7 months ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆281Jun 25, 2024Updated last year
- Book Recommendation System☆235May 2, 2024Updated last year
- A curated list of awesome papers related to adversarial attacks and defenses for information retrieval. If I missed any papers, feel free…☆221Jul 11, 2024Updated last year
- [ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"☆269Dec 30, 2024Updated last year
- Official repository for the paper PLLaVA☆676Jul 28, 2024Updated last year
- InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions☆2,921May 26, 2025Updated 9 months ago
- 🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing imp…☆3,338Mar 5, 2024Updated last year
- An Workspace for HMI tools☆164Jul 11, 2024Updated last year
- ☆87Dec 20, 2024Updated last year
- YiTu is an easy-to-use runtime to fully exploit the hybrid parallelism of different hardwares (e.g., GPU) to efficiently support the exec…☆254Jan 7, 2026Updated last month
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆131Aug 21, 2024Updated last year
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"☆691Jan 7, 2024Updated 2 years ago
- ☆143May 25, 2024Updated last year
- Harnessing the Power of AI to Navigate the Information Age – Uncovering Truth, Promoting Transparency, and Championing Fact-Based Discour…☆147Jun 2, 2023Updated 2 years ago
- This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral☆397Jun 2, 2025Updated 9 months ago
- Dive into Nature Simulation v1, a dynamic ecosystem game. Experience life's balance with interactive controls and stunning visuals of flo…☆248Dec 23, 2024Updated last year