csuhan / OneLLMLinks
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
β668Updated last year
Alternatives and similar repositories for OneLLM
Users that are interested in OneLLM are comparing it to the libraries listed below
Sorting:
- γICLR 2024π₯γ Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignmentβ865Updated last year
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsβ763Updated 2 years ago
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understandingβ682Updated last year
- β643Updated last year
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β504Updated last year
- VisionLLM Seriesβ1,137Updated 11 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ317Updated last year
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"β525Updated 2 years ago
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understandingβ344Updated last year
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β688Updated 2 years ago
- Official implementation of SEED-LLaMA (ICLR 2024).β639Updated last year
- LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMsβ413Updated last month
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ602Updated last year
- [AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inferenceβ293Updated last year
- LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)β859Updated last year
- β805Updated last year
- β¨β¨Woodpecker: Hallucination Correction for Multimodal Large Language Modelsβ650Updated last year
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.β617Updated last week
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imagβ¦β556Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ458Updated last year
- Research Trends in LLM-guided Multimodal Learning.β357Updated 2 years ago
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β471Updated 2 years ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β360Updated last year
- [ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Modelβ342Updated last year
- A family of lightweight multimodal models.β1,050Updated last year
- A Framework of Small-scale Large Multimodal Modelsβ960Updated 9 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKUβ360Updated 2 years ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Eβ¦β544Updated 8 months ago
- [CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understandingβ409Updated 9 months ago
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β943Updated 6 months ago