Victorwz / LLaVA-Llama-3
Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.
☆50Updated 2 months ago
Related projects: ⓘ
- This is the official implementation of the paper "Needle In A Multimodal Haystack"☆72Updated 2 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆128Updated last month
- ☆46Updated 10 months ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆75Updated 8 months ago
- ☆70Updated 6 months ago
- Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆103Updated last month
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆64Updated 2 weeks ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆52Updated 2 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning"☆158Updated last week
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆57Updated 2 months ago
- Official repo for StableLLAVA☆90Updated 8 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆30Updated 2 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆36Updated 2 months ago
- A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, qwen-vl, phi3-v …☆123Updated last week
- ☆65Updated last year
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆47Updated 2 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆53Updated last month
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆138Updated last week
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆45Updated 4 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆36Updated 5 months ago
- ☆53Updated 7 months ago
- Dense Connector for MLLMs☆98Updated last month
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆115Updated 2 weeks ago
- ☆100Updated last month
- LVBench: An Extreme Long Video Understanding Benchmark☆51Updated 3 weeks ago
- MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆116Updated 2 weeks ago
- InstructionGPT-4☆35Updated 8 months ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆49Updated last month
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆68Updated 2 months ago