kyegomez / MirasolLinks
Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"
☆26Updated 4 months ago
Alternatives and similar repositories for Mirasol
Users that are interested in Mirasol are comparing it to the libraries listed below
Sorting:
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆25Updated last month
- PyTorch implementation of StableMask (ICML'24)☆13Updated 11 months ago
- MIO: A Foundation Model on Multimodal Tokens☆25Updated 5 months ago
- LMM solved catastrophic forgetting, AAAI2025☆43Updated last month
- ☆18Updated last year
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆45Updated 4 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆43Updated 3 months ago
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆51Updated last year
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆20Updated 2 months ago
- ☆51Updated last year
- Project for SNARE benchmark☆11Updated last year
- ☆43Updated 2 weeks ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆19Updated last year
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆60Updated 7 months ago
- Language Repository for Long Video Understanding☆31Updated 11 months ago
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆15Updated 6 months ago
- ☆42Updated 6 months ago
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆46Updated 9 months ago
- Preference Learning for LLaVA☆45Updated 6 months ago
- Data for evaluating GPT-4V☆11Updated last year
- A Comprehensive Benchmark for Robust Multi-image Understanding☆11Updated 9 months ago
- ☆77Updated 4 months ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Updated 10 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆48Updated 10 months ago
- ☆31Updated last year
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…☆51Updated last week
- Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue (ACL 2024)☆23Updated 9 months ago
- [EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens☆24Updated last year
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆34Updated 10 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆25Updated last month