csuhan / OneLLMLinks
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
β665Updated last year
Alternatives and similar repositories for OneLLM
Users that are interested in OneLLM are comparing it to the libraries listed below
Sorting:
- γICLR 2024π₯γ Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignmentβ852Updated last year
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understandingβ670Updated 10 months ago
- β632Updated last year
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsβ763Updated last year
- VisionLLM Seriesβ1,130Updated 9 months ago
- A Framework of Small-scale Large Multimodal Modelsβ933Updated 7 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ318Updated last year
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understandingβ344Updated last year
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ599Updated last year
- LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)β851Updated last year
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"β523Updated last year
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β929Updated 4 months ago
- [CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understandingβ402Updated 7 months ago
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β503Updated last year
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β682Updated last year
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Modβ¦β352Updated 8 months ago
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β470Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ460Updated last year
- [AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inferenceβ289Updated 11 months ago
- Efficient Multimodal Large Language Models: A Surveyβ376Updated 7 months ago
- β¨β¨Woodpecker: Hallucination Correction for Multimodal Large Language Modelsβ642Updated 11 months ago
- Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"β862Updated last year
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsβ332Updated last year
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β356Updated 11 months ago
- Long Context Transfer from Language to Visionβ398Updated 8 months ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.β568Updated last week
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksβ390Updated last year
- Official implementation of SEED-LLaMA (ICLR 2024).β636Updated last year
- LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMsβ404Updated 2 weeks ago
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Modelsβ285Updated last year