math-eval / MathEvalLinks
MathEval is a benchmark dedicated to the holistic evaluation on mathematical capacities of LLMs.
β84Updated 11 months ago
Alternatives and similar repositories for MathEval
Users that are interested in MathEval are comparing it to the libraries listed below
Sorting:
- β83Updated last year
- π An unofficial implementation of Self-Alignment with Instruction Backtranslation.β139Updated 5 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuningβ178Updated 3 months ago
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)β95Updated 7 months ago
- β147Updated last year
- β83Updated last year
- Fantastic Data Engineering for Large Language Modelsβ90Updated 9 months ago
- Collection of papers for scalable automated alignment.β93Updated 11 months ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuningβ276Updated 2 years ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Lβ¦β51Updated last year
- Code implementation of synthetic continued pretrainingβ135Updated 9 months ago
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenariosβ69Updated 5 months ago
- β51Updated last year
- [ICML'2024] Can AI Assistants Know What They Don't Know?β83Updated last year
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feeβ¦β37Updated last year
- Logiqa2.0 dataset - logical reasoning in MRC and NLI tasksβ99Updated 2 years ago
- Counting-Stars (β )β83Updated 4 months ago
- Towards Systematic Measurement for Long Text Qualityβ36Updated last year
- β145Updated last year
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.β44Updated last year
- A curated reading list for large language model (LLM) alignment. Take a look at our new survey "Large Language Model Alignment: A Survey"β¦β81Updated 2 years ago
- Unofficial implementation of AlpaGasusβ93Updated 2 years ago
- β49Updated last year
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimationβ89Updated 11 months ago
- [ICML 2024] Selecting High-Quality Data for Training Language Modelsβ189Updated last year
- Do Large Language Models Know What They Donβt Know?β99Updated 11 months ago
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".β131Updated 11 months ago
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialoguesβ120Updated last year
- Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"β47Updated last year
- β312Updated last year