csuhan / OneLLMLinks
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
β654Updated 11 months ago
Alternatives and similar repositories for OneLLM
Users that are interested in OneLLM are comparing it to the libraries listed below
Sorting:
- γICLR 2024π₯γ Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignmentβ831Updated last year
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understandingβ652Updated 8 months ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsβ758Updated last year
- β628Updated last year
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ316Updated last year
- VisionLLM Seriesβ1,112Updated 7 months ago
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β676Updated last year
- A Framework of Small-scale Large Multimodal Modelsβ905Updated 5 months ago
- [AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inferenceβ289Updated 9 months ago
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β497Updated last year
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Modβ¦β342Updated 6 months ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ459Updated 10 months ago
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β919Updated 2 months ago
- LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)β838Updated last year
- Efficient Multimodal Large Language Models: A Surveyβ373Updated 5 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β350Updated 9 months ago
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understandingβ333Updated last year
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"β523Updated last year
- LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformerβ387Updated 5 months ago
- Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"β861Updated last year
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ594Updated last year
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Modelsβ274Updated last year
- Research Trends in LLM-guided Multimodal Learning.β355Updated last year
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imagβ¦β539Updated last year
- NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editingβ569Updated 11 months ago
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Surveyβ450Updated 8 months ago
- [ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Modelβ339Updated 11 months ago
- β797Updated last year
- β¨β¨Woodpecker: Hallucination Correction for Multimodal Large Language Modelsβ639Updated 9 months ago
- Long Context Transfer from Language to Visionβ394Updated 6 months ago