kyegomez / MirasolLinks
Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"
☆26Updated 4 months ago
Alternatives and similar repositories for Mirasol
Users that are interested in Mirasol are comparing it to the libraries listed below
Sorting:
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆25Updated last month
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆51Updated last year
- ☆50Updated last year
- ☆43Updated last month
- ☆36Updated last year
- Language Repository for Long Video Understanding☆31Updated last year
- ☆24Updated last year
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆20Updated 2 months ago
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆60Updated 8 months ago
- Implementation of Qformer from BLIP2 in Zeta Lego blocks.☆39Updated 7 months ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆46Updated 5 months ago
- PyTorch implementation of StableMask (ICML'24)☆13Updated 11 months ago
- ☆31Updated last year
- Project for SNARE benchmark☆11Updated last year
- ☆42Updated 7 months ago
- LMM solved catastrophic forgetting, AAAI2025☆43Updated 2 months ago
- A Comprehensive Benchmark for Robust Multi-image Understanding☆11Updated 9 months ago
- ☆80Updated 5 months ago
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆24Updated 9 months ago
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆70Updated 4 months ago
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆30Updated 8 months ago
- Official code and dataset for our EMNLP 2024 Findings paper: Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Kn…☆19Updated 5 months ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆39Updated 3 months ago
- Data for evaluating GPT-4V☆11Updated last year
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆44Updated 3 months ago
- MIO: A Foundation Model on Multimodal Tokens☆27Updated 6 months ago
- A project for tri-modal LLM benchmarking and instruction tuning.☆38Updated 2 months ago
- Code for paper: Unified Text-to-Image Generation and Retrieval☆15Updated 11 months ago
- ☆18Updated 11 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆38Updated 8 months ago