csuhan / OneLLMLinks
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
β658Updated last year
Alternatives and similar repositories for OneLLM
Users that are interested in OneLLM are comparing it to the libraries listed below
Sorting:
- γICLR 2024π₯γ Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignmentβ844Updated last year
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsβ760Updated last year
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"β522Updated last year
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understandingβ664Updated 9 months ago
- β629Updated last year
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ317Updated last year
- VisionLLM Seriesβ1,121Updated 8 months ago
- [AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inferenceβ289Updated 10 months ago
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β468Updated last year
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understandingβ341Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ460Updated 11 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β353Updated 9 months ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.β561Updated 4 months ago
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β680Updated last year
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Modβ¦β344Updated 7 months ago
- A Framework of Small-scale Large Multimodal Modelsβ914Updated 6 months ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ596Updated last year
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β498Updated last year
- LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)β845Updated last year
- [CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understandingβ397Updated 6 months ago
- Efficient Multimodal Large Language Models: A Surveyβ375Updated 6 months ago
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imagβ¦β546Updated last year
- Official implementation of SEED-LLaMA (ICLR 2024).β632Updated last year
- β¨β¨Woodpecker: Hallucination Correction for Multimodal Large Language Modelsβ638Updated 10 months ago
- LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformerβ390Updated this week
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsβ331Updated last year
- Research Trends in LLM-guided Multimodal Learning.β355Updated 2 years ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)β312Updated 9 months ago
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Modelsβ277Updated last year
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"β862Updated 6 months ago