Leezekun / MMSci
MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension
☆40Updated 3 months ago
Alternatives and similar repositories for MMSci:
Users that are interested in MMSci are comparing it to the libraries listed below
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers" [NeurIPS D&B, 2024]☆55Updated 2 months ago
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆38Updated last year
- The source code for running LLMs on the AAAR-1.0 benchmark.☆16Updated 3 weeks ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆50Updated 5 months ago
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆28Updated 8 months ago
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆22Updated last week
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆65Updated 9 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆45Updated 2 months ago
- [NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs☆39Updated 3 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆80Updated 6 months ago
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆22Updated 3 months ago
- The code and data for the paper JiuZhang3.0☆42Updated 9 months ago
- This the implementation of LeCo☆32Updated 2 months ago
- Official Code of IdealGPT☆34Updated last year
- ☆43Updated 4 months ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆84Updated last month
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆31Updated 3 weeks ago
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆24Updated 5 months ago
- A trainable user simulator☆34Updated 6 months ago
- ☆95Updated last year
- Source code for the paper "Prefix Language Models are Unified Modal Learners"☆43Updated last year
- Evaluate the Quality of Critique☆35Updated 9 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆51Updated last month
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆55Updated 3 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆40Updated 3 months ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆74Updated 3 months ago
- Code and Data Repo for [NeurIPS 2024] Paper "Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning"☆23Updated 9 months ago
- ☆68Updated 2 months ago
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆40Updated 2 weeks ago
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆23Updated 6 months ago