chenllliang / MMEvalPro
Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs
☆21Updated 2 months ago
Related projects: ⓘ
- ☆13Updated last month
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆38Updated 2 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆21Updated 2 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆39Updated 3 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆52Updated 2 months ago
- Code for Findings of EMNLP2023 paper "Coarse-to-Fine Dual Encoders are Better Frame Identification Learners"☆12Updated 11 months ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆36Updated 2 months ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆23Updated 2 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆33Updated 6 months ago
- 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆52Updated 3 weeks ago
- ☆46Updated 2 weeks ago
- A comprehensive survey on Internal Consistency and Self-Feedback in Large Language Models.☆38Updated this week
- This the implementation of LeCo☆16Updated 2 months ago
- ☆31Updated 3 months ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models☆42Updated last week
- A Synthetic, Scalable and Systematic Evaluation Suite for Large Language Models☆31Updated 3 months ago
- Visual and Embodied Concepts evaluation benchmark☆21Updated 11 months ago
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆24Updated 2 months ago
- Code implementation of synthetic continued pretraining☆13Updated this week
- The code and data for the paper JiuZhang3.0☆29Updated 3 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆54Updated 6 months ago
- InstructRAG: Instructing Retrieval-Augmented Generation with Explicit Denoising☆32Updated 2 months ago
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆79Updated last month
- ☆11Updated last week
- Benchmarking Benchmark Leakage in Large Language Models☆39Updated 4 months ago
- Evaluating Mathematical Reasoning Beyond Accuracy☆32Updated 5 months ago
- ☆24Updated 7 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆31Updated 5 months ago
- "Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiw…☆19Updated 4 months ago
- Vision Large Language Models trained on M3IT instruction tuning dataset☆17Updated last year