PKU-YuanGroup / LLaVA-o1
☆53Updated 2 months ago
Alternatives and similar repositories for LLaVA-o1:
Users that are interested in LLaVA-o1 are comparing it to the libraries listed below
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆189Updated last month
- Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆128Updated last month
- OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation☆62Updated 2 weeks ago
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆57Updated 7 months ago
- Rethinking Step-by-step Visual Reasoning in LLMs☆240Updated 3 weeks ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆73Updated 3 months ago
- FuseAI Project☆83Updated 3 weeks ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆46Updated 2 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆48Updated 2 weeks ago
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆89Updated last month
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆114Updated 3 months ago
- ☆29Updated 3 weeks ago
- ☆68Updated 7 months ago
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆140Updated 3 weeks ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆81Updated 3 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆86Updated last month
- An open-source implementaion for fine-tuning Molmo-7B-D and Molmo-7B-O by allenai.☆44Updated 3 weeks ago
- ApolloMoE: Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts☆38Updated 2 months ago
- Long Context Transfer from Language to Vision☆360Updated 2 months ago
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆62Updated 4 months ago
- Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya☆107Updated 2 weeks ago
- ☆27Updated 5 months ago
- Code for ScribeAgent paper☆49Updated last month
- ☆59Updated this week
- Train, tune, and infer Bamba model☆83Updated 3 weeks ago
- ☆158Updated last month
- Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D…☆34Updated last week