math-eval / MathEvalLinks
MathEval is a benchmark dedicated to the holistic evaluation on mathematical capacities of LLMs.
β84Updated last year
Alternatives and similar repositories for MathEval
Users that are interested in MathEval are comparing it to the libraries listed below
Sorting:
- β147Updated last year
- π An unofficial implementation of Self-Alignment with Instruction Backtranslation.β138Updated 7 months ago
- β87Updated last year
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)β97Updated 9 months ago
- Fantastic Data Engineering for Large Language Modelsβ92Updated 11 months ago
- Do Large Language Models Know What They Donβt Know?β102Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuningβ184Updated 5 months ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuningβ284Updated 2 years ago
- Collection of papers for scalable automated alignment.β94Updated last year
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Modelsβ267Updated last year
- Code implementation of synthetic continued pretrainingβ142Updated 11 months ago
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"β136Updated last year
- β146Updated last year
- β130Updated 6 months ago
- A curated reading list for large language model (LLM) alignment. Take a look at our new survey "Large Language Model Alignment: A Survey"β¦β81Updated 2 years ago
- Generative Judge for Evaluating Alignmentβ248Updated last year
- Logiqa2.0 dataset - logical reasoning in MRC and NLI tasksβ100Updated 2 years ago
- Counting-Stars (β )β83Updated last week
- β315Updated last year
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.β44Updated last year
- Project for the paper entitled `Instruction Tuning for Large Language Models: A Survey`β203Updated 3 months ago
- [ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Modelsβ117Updated 5 months ago
- [ICML 2024] Selecting High-Quality Data for Training Language Modelsβ193Updated last year
- β51Updated last year
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feeβ¦β37Updated last year
- [ICML'2024] Can AI Assistants Know What They Don't Know?β84Updated last year
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scaleβ263Updated 4 months ago
- Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"β47Updated 2 years ago
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenariosβ71Updated 6 months ago
- [ACL 2023] This is the code repo for our ACL'23 paper "Augmentation-Adapted Retriever Improves Generalization of Language Models as Generβ¦β60Updated last year