multimodal-art-projection / OmniBench
A project for tri-modal LLM benchmarking and instruction tuning.
☆23Updated last month
Alternatives and similar repositories for OmniBench:
Users that are interested in OmniBench are comparing it to the libraries listed below
- MIO: A Foundation Model on Multimodal Tokens☆21Updated 3 months ago
- PyTorch implementation of StableMask (ICML'24)☆12Updated 8 months ago
- LMM solved catastrophic forgetting, AAAI2025☆39Updated 4 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆35Updated 8 months ago
- ☆49Updated last year
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Updated last month
- [Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.☆33Updated this week
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆21Updated 7 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 8 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆14Updated 2 weeks ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆41Updated 2 weeks ago
- ☆17Updated 2 months ago
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆23Updated 5 months ago
- ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer☆25Updated 2 months ago
- ☆22Updated 5 months ago
- ☆28Updated 6 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆33Updated last month
- ☆73Updated last year
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆30Updated 8 months ago
- Code for Findings of EMNLP2023 paper "Coarse-to-Fine Dual Encoders are Better Frame Identification Learners"☆12Updated last year
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆44Updated last year
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Updated 8 months ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆51Updated 7 months ago
- Open-Pandora: On-the-fly Control Video Generation☆32Updated 3 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆43Updated 9 months ago
- Preference Learning for LLaVA☆39Updated 4 months ago