kyegomez / Mirasol
Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"
☆24Updated last week
Related projects ⓘ
Alternatives and complementary repositories for Mirasol
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆27Updated 5 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆23Updated last week
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆16Updated last week
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆42Updated 3 weeks ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆82Updated 4 months ago
- ☆39Updated last year
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆47Updated 10 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆47Updated last year
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆89Updated last week
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆32Updated last month
- Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆40Updated 4 months ago
- PyTorch implementation of StableMask (ICML'24)☆12Updated 4 months ago
- ☆72Updated 6 months ago
- ☆58Updated 9 months ago
- Language Repository for Long Video Understanding☆28Updated 5 months ago
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆73Updated 6 months ago
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with…☆22Updated last week
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆25Updated last week
- Implementation of Qformer from BLIP2 in Zeta Lego blocks.☆31Updated last week
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆22Updated 4 months ago
- Code release for the paper "Egocentric Video Task Translation" (CVPR 2023 Highlight)☆31Updated last year
- Data-Efficient Multimodal Fusion on a Single GPU☆47Updated 6 months ago
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆45Updated last month
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆48Updated last year
- LMM which strictly superset LLM embedded☆30Updated 2 weeks ago
- Official implement of MIA-DPO☆40Updated 2 weeks ago
- ☆24Updated last year
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 3 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆52Updated 2 months ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆55Updated last month