ByungKwanLee / MoAI
[ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of numerous zero-shot vision language tasks.
☆319Updated 11 months ago
Alternatives and similar repositories for MoAI:
Users that are interested in MoAI are comparing it to the libraries listed below
- [EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…☆90Updated 8 months ago
- [ACL 2024 Findings] Official PyTorch Implementation code for realizing the technical part of CoLLaVO: Crayon Large Language and Vision mO…☆95Updated 8 months ago
- [NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…☆109Updated 9 months ago
- LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer☆367Updated last month
- Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs☆616Updated last month
- a family of highly capabale yet efficient large multimodal models☆176Updated 6 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆233Updated 2 months ago
- An open-source implementaion for fine-tuning Phi3-Vision and Phi3.5-Vision by Microsoft.☆87Updated 2 weeks ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆476Updated last month
- [ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant☆230Updated 6 months ago
- Rethinking Step-by-step Visual Reasoning in LLMs☆259Updated last month
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆315Updated 7 months ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆192Updated this week
- Official implementation of project Honeybee (CVPR 2024)☆444Updated 9 months ago
- LLaVA-Interactive-Demo☆364Updated 7 months ago
- [AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference☆266Updated last month
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆194Updated last month
- When do we not need larger vision models?☆372Updated 3 weeks ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆726Updated last year
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆145Updated last month
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"☆265Updated 8 months ago
- Official implementation of the Law of Vision Representation in MLLMs☆150Updated 3 months ago
- MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning☆353Updated 6 months ago
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆127Updated 8 months ago
- Long Context Transfer from Language to Vision☆364Updated 3 months ago
- HPT - Open Multimodal LLMs from HyperGAI☆313Updated 9 months ago
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …☆475Updated 6 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆146Updated 5 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆143Updated 8 months ago
- ☆374Updated 2 months ago