thuiar / MMLALinks
The first comprehensive multimodal language analysis benchmark for evaluating foundation models
☆16Updated last month
Alternatives and similar repositories for MMLA
Users that are interested in MMLA are comparing it to the libraries listed below
Sorting:
- ☆14Updated 7 months ago
- ☆14Updated last month
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆15Updated 4 months ago
- Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"☆33Updated last year
- instruction-following benchmark for large reasoning models☆36Updated last week
- ☆54Updated last year
- On Path to Multimodal Generalist: General-Level and General-Bench☆19Updated last month
- Unsupervised GRPO☆41Updated 2 months ago
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆15Updated 6 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆27Updated 3 months ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆26Updated 3 months ago
- ☆49Updated 5 months ago
- ☆38Updated 9 months ago
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…☆93Updated last month
- ☆53Updated this week
- A project for tri-modal LLM benchmarking and instruction tuning.☆42Updated 4 months ago
- [NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding☆18Updated 10 months ago
- [arxiv: 2505.02156] Adaptive Thinking via Mode Policy Optimization for Social Language Agents☆40Updated last month
- ☆91Updated last month
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆38Updated 2 months ago
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆40Updated last month
- ☆12Updated 6 months ago
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆16Updated last week
- [ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]☆13Updated last month
- [ACL 2025 (Findings)] DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling☆16Updated 8 months ago
- ☆24Updated last week
- [ICLR 2025] Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization☆12Updated 6 months ago
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆25Updated 9 months ago
- code for "CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models"☆19Updated 5 months ago
- Chinese Vision-Language Understanding Evaluation☆24Updated 7 months ago