Official github repo for E-Eval, a Chinese K12 education evaluation benchmark for LLMs.
☆29Feb 19, 2024Updated 2 years ago
Alternatives and similar repositories for E-EVAL
Users that are interested in E-EVAL are comparing it to the libraries listed below
Sorting:
- CEduMEval : A Chinese educational multi-task evaluation benchmark☆16Nov 18, 2024Updated last year
- ☆17Oct 15, 2023Updated 2 years ago
- ☆25Apr 8, 2025Updated 10 months ago
- ☆10Mar 8, 2024Updated last year
- Official code for the paper "Joint Bayesian Inference of Graphical Structure and Parameters with a Single Generative Flow Network"☆16Aug 9, 2023Updated 2 years ago
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated 11 months ago
- ☆185Apr 30, 2025Updated 10 months ago
- [AAAI 2026] AutoTool: Efficient Tool Selection for Large Language Model Agents☆29Dec 28, 2025Updated 2 months ago
- ☆12Mar 21, 2024Updated last year
- ☆13Jan 31, 2023Updated 3 years ago
- Urban Generative Intelligence (UGI): A Foundational Platform for Embodied Agent and Future City☆12Dec 17, 2023Updated 2 years ago
- ☆12Nov 21, 2023Updated 2 years ago
- Example of evaluation metrics used in the SynthRAD2023 challenge☆11Jul 14, 2023Updated 2 years ago
- ⚖️ Code for the paper "Ethical Adversaries: Towards Mitigating Unfairness with Adversarial Machine Learning".☆11Dec 8, 2022Updated 3 years ago
- Active Learning Helps Pretrained Models Learn the Intended Task (https://arxiv.org/abs/2204.08491) by Alex Tamkin, Dat Nguyen, Salil Desh…☆11Nov 22, 2022Updated 3 years ago
- BERT score for text generation☆12Jan 15, 2025Updated last year
- ☆12Jan 14, 2026Updated last month
- Official code of "The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets"☆23Sep 20, 2025Updated 5 months ago
- Official repository for "EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scena…☆19May 28, 2025Updated 9 months ago
- ☆11Nov 9, 2020Updated 5 years ago
- PhysReason Becnhmark☆19Jul 8, 2025Updated 7 months ago
- Multi-turn RL framework for aligning models to be tutors instead of answerers. EMNLP 2025 Oral☆31Dec 11, 2025Updated 2 months ago
- ☆10Dec 29, 2020Updated 5 years ago
- ☆11Dec 11, 2024Updated last year
- Code for "SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER" @COLING-2022☆11Aug 20, 2022Updated 3 years ago
- Code and extra figures as part of the thesis about Relative transfer function estimation for multi-microphone speech enhancement based on…☆11Jan 10, 2018Updated 8 years ago
- code and data associated with CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations☆11Oct 13, 2023Updated 2 years ago
- Exploring classifier-free guidance in a DDPM language model for text generation towards emotion targets.☆11Sep 7, 2025Updated 5 months ago
- Code for Expert Supervised Reinforcement Learning☆10Apr 7, 2021Updated 4 years ago
- Dynamic Traffic Assignment☆16Aug 25, 2020Updated 5 years ago
- A pipeline for the automatic construction of geometry problems along with step-by-step solutions.☆17Aug 27, 2025Updated 6 months ago
- 中文原生等级化代码能力测试基准☆15Apr 11, 2024Updated last year
- ☆14Nov 29, 2020Updated 5 years ago
- Auto-differentiation library for C++☆12Jan 16, 2022Updated 4 years ago
- AI for Mathematics Paper List☆17Jan 14, 2025Updated last year
- This is a complete online exam system☆10Dec 27, 2019Updated 6 years ago
- LLM Application Systems for Education☆11May 16, 2025Updated 9 months ago
- Networkx implementation of Yen's k shortest paths algorithm.☆11Nov 6, 2018Updated 7 years ago
- Reproducing several bandwidth-based traffic signal coordination models (including MaxBand, MultiBand, etc.)☆11Sep 18, 2020Updated 5 years ago