csuhan / OneLLMLinks
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
β653Updated 10 months ago
Alternatives and similar repositories for OneLLM
Users that are interested in OneLLM are comparing it to the libraries listed below
Sorting:
- γICLR 2024π₯γ Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignmentβ825Updated last year
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsβ759Updated last year
- β624Updated last year
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understandingβ644Updated 7 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ317Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ455Updated 9 months ago
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"β524Updated last year
- VisionLLM Seriesβ1,105Updated 6 months ago
- LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)β837Updated last year
- A Framework of Small-scale Large Multimodal Modelsβ893Updated 4 months ago
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understandingβ328Updated last year
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β912Updated last month
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksβ389Updated last year
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β495Updated last year
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ590Updated 11 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β348Updated 7 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).β621Updated 11 months ago
- LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformerβ383Updated 4 months ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Eβ¦β487Updated 3 months ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.β541Updated 2 months ago
- Fine-tuning "ImageBind One Embedding Space to Bind Them All" with LoRAβ191Updated last year
- β¨β¨Woodpecker: Hallucination Correction for Multimodal Large Language Modelsβ639Updated 8 months ago
- Long Context Transfer from Language to Visionβ392Updated 5 months ago
- Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"β861Updated last year
- Efficient Multimodal Large Language Models: A Surveyβ369Updated 4 months ago
- [CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understandingβ391Updated 4 months ago
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Modβ¦β341Updated 5 months ago
- [AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inferenceβ287Updated 8 months ago
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β673Updated last year
- β¨β¨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysisβ635Updated 3 weeks ago