csuhan / OneLLMLinks
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
β652Updated 10 months ago
Alternatives and similar repositories for OneLLM
Users that are interested in OneLLM are comparing it to the libraries listed below
Sorting:
- γICLR 2024π₯γ Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignmentβ822Updated last year
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsβ757Updated last year
- β621Updated last year
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ316Updated last year
- VisionLLM Seriesβ1,098Updated 5 months ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ454Updated 8 months ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ590Updated 10 months ago
- A Framework of Small-scale Large Multimodal Modelsβ874Updated 3 months ago
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β494Updated last year
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β665Updated last year
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understandingβ642Updated 6 months ago
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imagβ¦β534Updated last year
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"β522Updated last year
- [AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inferenceβ285Updated 7 months ago
- LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformerβ383Updated 4 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β346Updated 7 months ago
- β¨β¨Woodpecker: Hallucination Correction for Multimodal Large Language Modelsβ639Updated 8 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).β620Updated 11 months ago
- A family of lightweight multimodal models.β1,029Updated 9 months ago
- Research Trends in LLM-guided Multimodal Learning.β356Updated last year
- [ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Modelβ334Updated 9 months ago
- LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)β832Updated last year
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Modβ¦β337Updated 5 months ago
- β790Updated last year
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksβ386Updated last year
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understandingβ324Updated last year
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKUβ352Updated last year
- Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"β858Updated 11 months ago
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"β861Updated 3 months ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)β307Updated 7 months ago