SuperCLUE-Math6:新一代中文原生多轮多步数学推理数据集的探索之旅
☆58Feb 5, 2024Updated 2 years ago
Alternatives and similar repositories for SuperCLUE-Math6
Users that are interested in SuperCLUE-Math6 are comparing it to the libraries listed below
Sorting:
- Code for "SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER" @COLING-2022☆11Aug 20, 2022Updated 3 years ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆14Feb 10, 2023Updated 3 years ago
- Unifew: Unified Fewshot Learning Model☆18Sep 10, 2021Updated 4 years ago
- This is the repository of the Ape210K dataset and baseline models.☆199Dec 10, 2019Updated 6 years ago
- Source code for NeurIPS 2020 paper "Node Classification on Graphs with Few-Shot Novel Labels via Meta Transformed Network Embedding"☆10Nov 17, 2020Updated 5 years ago
- 中文大语言模型评测2024高考数学专题☆19Jun 14, 2024Updated last year
- GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.☆728Jan 7, 2025Updated last year
- [CIKM 2025] Constraint Back-translation Improves Complex Instruction Following of Large Language Models☆17May 23, 2025Updated 9 months ago
- ☆23Jan 31, 2025Updated last year
- [ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models☆119Jun 12, 2025Updated 9 months ago
- 中文原生等级化代码能力测试基准☆15Apr 11, 2024Updated last year
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆131May 16, 2025Updated 10 months ago
- [R]einforcement [L]earning from [M]odel-rewarded [T]hinking - code for the paper "Language Models That Think, Chat Better"☆125Oct 27, 2025Updated 4 months ago
- 中文原生检索增强生成测评基准☆126Apr 18, 2024Updated last year
- GAOGAO-Bench-Updates is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.☆39Jan 7, 2025Updated last year
- Official github repo for E-Eval, a Chinese K12 education evaluation benchmark for LLMs.☆29Feb 19, 2024Updated 2 years ago
- ☆19Jul 15, 2022Updated 3 years ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆26Dec 23, 2024Updated last year
- ☆485Jul 22, 2024Updated last year
- Question-Directed Graph Attention Network for Numerical Reasoning over Text☆10Aug 14, 2020Updated 5 years ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆53Jun 24, 2024Updated last year
- ☆342Jun 5, 2025Updated 9 months ago
- The official repository of the Omni-MATH benchmark.☆93Dec 22, 2024Updated last year
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)☆102Feb 20, 2025Updated last year
- A wiki platform for the students and teachers of Tsinghua University☆16Updated this week
- [ACL 2024] A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset☆24May 29, 2025Updated 9 months ago
- A structured parsing technique for NER☆15May 26, 2023Updated 2 years ago
- Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models☆89Apr 4, 2024Updated last year
- Gaokao Benchmark for AI☆108Jul 8, 2022Updated 3 years ago
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆25May 30, 2024Updated last year
- ☆164Apr 17, 2023Updated 2 years ago
- SelfCriticalSequenceTrainingforImageCaptioning☆21May 27, 2017Updated 8 years ago
- ☆12Mar 22, 2024Updated last year
- 将浏览器的样式复制后直接输出为tailwindcss的语法到对应的位置的插件,并且能够hover原生的css提示对应tailwindcss的语法☆13Nov 21, 2025Updated 4 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆329Jan 29, 2026Updated last month
- Explicit Sentence Compression for Neural Machine Translation☆10May 12, 2020Updated 5 years ago
- [ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset☆112May 22, 2025Updated 9 months ago
- ☆19Dec 6, 2024Updated last year
- ☆17Jul 4, 2025Updated 8 months ago