PKU-YuanGroup / LLaVA-o1
☆56Updated 3 months ago
Alternatives and similar repositories for LLaVA-o1:
Users that are interested in LLaVA-o1 are comparing it to the libraries listed below
- ☆31Updated last month
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆57Updated 3 weeks ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆48Updated 3 months ago
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆58Updated 9 months ago
- OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation☆68Updated last week
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆54Updated 4 months ago
- ☆29Updated 7 months ago
- ☆60Updated last month
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆67Updated last week
- Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.☆64Updated 4 months ago
- Rethinking Step-by-step Visual Reasoning in LLMs☆275Updated last month
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆75Updated 4 months ago
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆89Updated 3 months ago
- A new novel multi-modality (Vision) RAG architecture☆23Updated 5 months ago
- FuseAI Project☆83Updated last month
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆146Updated 2 months ago
- ☆23Updated 6 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆199Updated 2 months ago
- ☆57Updated 8 months ago
- ☆35Updated 2 weeks ago
- ☆36Updated last year
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆185Updated 2 weeks ago
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆105Updated last month
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆86Updated 2 months ago