multimodal-art-projection / OmniBench
A project for tri-modal LLM benchmarking and instruction tuning.
☆34Updated last month
Alternatives and similar repositories for OmniBench:
Users that are interested in OmniBench are comparing it to the libraries listed below
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…☆44Updated this week
- Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)☆32Updated last month
- LMM solved catastrophic forgetting, AAAI2025☆41Updated 3 weeks ago
- MIO: A Foundation Model on Multimodal Tokens☆25Updated 4 months ago
- ☆37Updated 3 weeks ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Updated 3 months ago
- a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.☆11Updated 3 weeks ago
- ☆41Updated this week
- ☆38Updated 8 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆53Updated last week
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆38Updated 10 months ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems☆81Updated last year
- ☆91Updated 3 weeks ago
- ☆18Updated 11 months ago
- The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) arc…☆13Updated 2 months ago
- [ACL 2024] A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset☆14Updated 3 weeks ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆23Updated 2 weeks ago
- ☆46Updated 2 weeks ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆95Updated 4 months ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆55Updated 9 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆35Updated 3 months ago
- ☆17Updated last year
- ☆40Updated 3 weeks ago
- PyTorch implementation of StableMask (ICML'24)☆12Updated 10 months ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆35Updated 2 months ago
- ☆51Updated last year
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆24Updated this week
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆24Updated 4 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆42Updated 2 months ago