[ACL 2026] A large-scale longitudinal study on robust and fair evaluation of LLMs — 200K+ generative questions across 13 disciplines
☆39May 21, 2026Updated 3 weeks ago
Alternatives and similar repositories for LLMEval-Fair
Users that are interested in LLMEval-Fair are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [AAAI 2024] LLMEval Phase II dataset — professional domain evaluation across 12 academic disciplines☆71May 21, 2026Updated 3 weeks ago
- [EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform"☆64May 16, 2025Updated last year
- Chinese Generation Evaluation☆13Aug 14, 2023Updated 2 years ago
- Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning☆154Jun 1, 2026Updated 2 weeks ago
- [ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"☆19Jun 1, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆10Mar 13, 2023Updated 3 years ago
- [ACL 2024] Making Long-Context Language Models Better Multi-Hop Reasoners☆20May 28, 2024Updated 2 years ago
- Official code for the paper Improving Language Plasticity via Pretraining with Active Forgetting, NeurIPS 2023☆22Mar 12, 2026Updated 3 months ago
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)☆102Feb 20, 2025Updated last year
- ☆13Aug 12, 2022Updated 3 years ago
- ☆13Mar 5, 2025Updated last year
- ☆47Oct 22, 2024Updated last year
- Functional and Learnable Cell dynamicS☆21May 13, 2025Updated last year
- ☆13Sep 26, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆14Dec 14, 2023Updated 2 years ago
- Explanation of the llama2 repo.☆13Jul 18, 2024Updated last year
- code for Scaling Laws of RoPE-based Extrapolation☆73Oct 16, 2023Updated 2 years ago
- ☆46May 3, 2026Updated last month
- Official implementation of Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information☆12Sep 28, 2023Updated 2 years ago
- ICLR 2021 (spotlight): Graph Convolution with Low-rank Learnable Local Filters☆16Jan 14, 2021Updated 5 years ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Apr 20, 2025Updated last year
- Code release for "MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning"☆11Oct 11, 2024Updated last year
- ☆22Jul 1, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICLR 2025] "GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation", Tao Feng, Yihang Sun, Jiaxuan You☆18Mar 18, 2025Updated last year
- FlagEval is an evaluation toolkit for AI large foundation models.☆337Apr 24, 2025Updated last year
- Dataset containing Semantic Relations and Metadata, for Training and Evaluating Distributional Semantic Models in English and Mandarin Ch…☆16Aug 7, 2017Updated 8 years ago
- Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"☆145Feb 4, 2026Updated 4 months ago
- JudgeLRM: Large Reasoning Models as a Judge☆42May 6, 2026Updated last month
- GLM-SIMPLE-EVALS: The evaluation repository for the GLM-4.5 series of models by Z.ai.☆40Oct 17, 2025Updated 7 months ago
- (ACL 2025) 🔥🔥🔥Code for "Empowering Multimodal Large Language Models with Evol-Instruct"☆22May 15, 2025Updated last year
- Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models☆41Sep 30, 2024Updated last year
- Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction☆10May 25, 2022Updated 4 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision☆19Apr 1, 2025Updated last year
- Weighted Training for Cross-Task Learning☆15Feb 12, 2023Updated 3 years ago
- Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models☆35Oct 19, 2023Updated 2 years ago
- [AAAI 2023] This is the code for our paper `Neighborhood-Regularized Self-Training for Learning with Few Labels'.☆12Jan 11, 2023Updated 3 years ago
- The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)☆40May 15, 2024Updated 2 years ago
- A novel method named stDiff investigates the potential of employing diffusion models for single-cell omics generation.☆26Mar 13, 2024Updated 2 years ago
- ☆13Mar 11, 2025Updated last year