llmeval / Llmeval-Gaokao2024-Math
中文大语言模型评测2024高考数学专题
☆16Updated 8 months ago
Alternatives and similar repositories for Llmeval-Gaokao2024-Math:
Users that are interested in Llmeval-Gaokao2024-Math are comparing it to the libraries listed below
- 中文大语言模型评测第三期☆24Updated 8 months ago
- ☆81Updated 10 months ago
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆38Updated 6 months ago
- ☆36Updated 5 months ago
- Gaokao Benchmark for AI☆105Updated 2 years ago
- ☆91Updated 2 months ago
- GAOGAO-Bench-Updates is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.☆25Updated last month
- ☆95Updated last year
- 中文大语言模型评测第二期☆70Updated last year
- ☆88Updated 10 months ago
- The code of arxiv paper: "CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis"☆21Updated last month
- Repo for for paper "AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction".☆59Updated 6 months ago
- ☆139Updated 7 months ago
- ☆48Updated 11 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated 11 months ago
- [ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling☆91Updated 4 months ago
- A Toolkit for Table-based Question Answering☆109Updated last year
- 本项目用于大模型数学解题能力方面的数据集合成,模型训练及评测,相关文章记录。☆73Updated 5 months ago
- 大语言模型训练和服务调研☆36Updated last year
- Light local website for displaying performances from different chat models.☆85Updated last year
- ☆80Updated last year
- 国内首个全参数训练的法律大模型 HanFei-1.0 (韩非)☆113Updated last year
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆78Updated last year
- ☆42Updated 2 months ago
- ☆45Updated 8 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆49Updated last month
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆118Updated 8 months ago
- ☆53Updated 3 months ago
- ☆16Updated last year
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆65Updated 2 months ago