SuperCLUE-Math6:新一代中文原生多轮多步数学推理数据集的探索之旅
☆60Feb 5, 2024Updated 2 years ago
Alternatives and similar repositories for SuperCLUE-Math6
Users that are interested in SuperCLUE-Math6 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for "SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER" @COLING-2022☆11Aug 20, 2022Updated 3 years ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆14Feb 10, 2023Updated 3 years ago
- Unifew: Unified Fewshot Learning Model☆18Sep 10, 2021Updated 4 years ago
- [EMNLP 2025] Verification Engineering for RL in Instruction Following☆56Mar 30, 2026Updated 2 months ago
- Source code for NeurIPS 2020 paper "Node Classification on Graphs with Few-Shot Novel Labels via Meta Transformed Network Embedding"☆10Nov 17, 2020Updated 5 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆11Aug 4, 2024Updated last year
- GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.☆761Jan 7, 2025Updated last year
- LLM evaluation on 2024 Chinese Gaokao Mathematics — zero-contamination benchmark with dual prompt formats☆21Apr 15, 2026Updated 2 months ago
- [CIKM 2025] Constraint Back-translation Improves Complex Instruction Following of Large Language Models☆19May 23, 2025Updated last year
- An Annotated Question Answering Dataset for Assisting Chinese Python Programming Learners☆10Feb 23, 2024Updated 2 years ago
- Source code and dataset for TKDE'22 paper "Region or Global? A Principle for Negative Sampling in Graph-based Recommendation"☆13Mar 15, 2022Updated 4 years ago
- [ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models☆118Jun 12, 2025Updated last year
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆139May 16, 2025Updated last year
- 中文原生等级化代码能力测试基准☆15Apr 11, 2024Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [R]einforcement [L]earning from [M]odel-rewarded [T]hinking - code for the paper "Language Models That Think, Chat Better"☆129Oct 27, 2025Updated 8 months ago
- 中文原生检索增强生成测评基准☆131Apr 18, 2024Updated 2 years ago
- GAOGAO-Bench-Updates is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.☆46Jan 7, 2025Updated last year
- A full codebase for replicating the results of Nougat from downloading arXiv dataset to the final evaluation. It also contains a few fixe…☆11Dec 11, 2023Updated 2 years ago
- Official github repo for E-Eval, a Chinese K12 education evaluation benchmark for LLMs.