kyegomez / Mirasol
Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"
☆24Updated this week
Related projects ⓘ
Alternatives and complementary repositories for Mirasol
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆27Updated 4 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆23Updated this week
- Data-Efficient Multimodal Fusion on a Single GPU☆47Updated 6 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆41Updated last week
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆47Updated 10 months ago
- ☆37Updated last year
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆30Updated 3 weeks ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆16Updated this week
- Implementation of Qformer from BLIP2 in Zeta Lego blocks.☆31Updated this week
- Language Repository for Long Video Understanding☆28Updated 4 months ago
- Data for evaluating GPT-4V☆11Updated last year
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆81Updated 4 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆25Updated 2 weeks ago
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆69Updated 9 months ago
- SMILE: A Multimodal Dataset for Understanding Laughter☆13Updated last year
- Project for SNARE benchmark☆10Updated 5 months ago
- A project for tri-modal LLM benchmarking and instruction tuning.☆13Updated this week
- Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆40Updated 4 months ago
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆16Updated this week
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆72Updated 8 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆47Updated last year
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆41Updated 4 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆48Updated 3 months ago
- An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"☆52Updated last year
- ☆72Updated 5 months ago
- A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset (ACL 2024)☆14Updated last month
- ☆45Updated last year
- ACL'2024 (Findings): TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆48Updated last year
- ☆24Updated last year
- Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆39Updated last year