Scalable Meta-Evaluation of LLMs as Evaluators
☆43Feb 15, 2024Updated 2 years ago
Alternatives and similar repositories for scaleeval
Users that are interested in scaleeval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13Jul 14, 2024Updated last year
- BeHonest: Benchmarking Honesty in Large Language Models☆35Aug 15, 2024Updated last year
- ☆78May 22, 2024Updated last year
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"☆43Jul 19, 2024Updated last year
- ☆51Mar 2, 2024Updated 2 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆81Jan 18, 2024Updated 2 years ago
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆107Mar 6, 2025Updated last year
- [ICLR 2026] Efficient Agent Training for Computer Use☆139Sep 5, 2025Updated 7 months ago
- ☆27Mar 27, 2025Updated last year
- Collections of RLxLM experiments using minimal codes☆14Feb 17, 2025Updated last year
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆78Oct 9, 2025Updated 6 months ago
- ☆25May 16, 2024Updated last year
- ☆12Sep 23, 2024Updated last year
- Trending projects & awesome papers about data-centric llm studies.☆40May 20, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Generative Judge for Evaluating Alignment☆248Jan 18, 2024Updated 2 years ago
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- Code for "[COLM'25] RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"☆24Mar 18, 2025Updated last year
- Align, a general text alignment function☆15Dec 7, 2023Updated 2 years ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆147Apr 9, 2025Updated last year
- Benchmarking Benchmark Leakage in Large Language Models☆60May 20, 2024Updated last year
- An end-to-end benchmark suite of multi-modal DNN applications for system-architecture co-design☆22Dec 13, 2024Updated last year
- [Findings of EMNLP'2024] Unified Active Retrieval for Retrieval Augmented Generation☆23Sep 30, 2024Updated last year
- Just a bunch of benchmark logs for different LLMs☆121Jul 28, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆341May 24, 2025Updated 10 months ago
- This code implements the algorithm of FIPO, a value-free RL recipe for eliciting deeper reasoning from a clean base model.☆89Updated this week
- [NeurlPS D&B 2024] Generative AI for Math: MathPile☆420Apr 4, 2025Updated last year
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆186Jul 23, 2025Updated 8 months ago
- ☆17Dec 12, 2020Updated 5 years ago
- SOTA Math Opensource LLM☆335Dec 12, 2023Updated 2 years ago
- [COLM 2025] An Open Math Pre-trainng Dataset with 370B Tokens.☆110Apr 4, 2025Updated last year
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆62Mar 30, 2024Updated 2 years ago
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆17Feb 9, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- FacTool: Factuality Detection in Generative AI☆922Aug 19, 2024Updated last year
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- Echo Noise Channel for Exact Mutual Information Calculation☆17Jul 17, 2020Updated 5 years ago
- ☆21Dec 5, 2022Updated 3 years ago
- Offcial Repo of Paper "Eliminating Position Bias of Language Models: A Mechanistic Approach""☆21Jun 13, 2025Updated 9 months ago
- EMNLP 2022: Finding Dataset Shortcuts with Grammar Induction https://arxiv.org/abs/2210.11560☆58Feb 28, 2025Updated last year
- ☆14Aug 9, 2024Updated last year