☆28Feb 28, 2026Updated this week
Alternatives and similar repositories for OJBench
Users that are interested in OJBench are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025 D&B (Spotlight🌟)] TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenario☆29Oct 5, 2025Updated 5 months ago
- ☆56Oct 27, 2025Updated 4 months ago
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆24Sep 26, 2024Updated last year
- The official repository of the Omni-MATH benchmark.☆93Dec 22, 2024Updated last year
- The rule-based evaluation subset and code implementation of Omni-MATH☆26Dec 23, 2024Updated last year
- 超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of Dee…☆34Apr 5, 2025Updated 11 months ago
- ☆36Jul 7, 2025Updated 7 months ago
- ☆10Nov 14, 2024Updated last year
- 赵纯想个人网站☆11Nov 3, 2024Updated last year
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated 11 months ago
- ☆60Updated this week
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆38Jul 25, 2024Updated last year
- [COLM 2025: 1st Workshop on the Application of LLM Explainability to Reasoning and Planning] Latent Chain-of-Thought? Decoding the Depth-…☆17Oct 4, 2025Updated 5 months ago
- Implementation of Differential Learning Rate in Keras☆11Jun 4, 2019Updated 6 years ago
- Code and Data for EMNLP 2023 Paper "MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Langu…☆14Apr 7, 2025Updated 10 months ago
- ☆21Sep 7, 2025Updated 5 months ago
- FormulaOne: A dataset of algorithmic problems based on MSO formulas.☆24Aug 14, 2025Updated 6 months ago
- 蚂蚁金融自然语言处理竞赛。☆10Sep 3, 2018Updated 7 years ago
- Introduction to PyTorch: A comprehensive Chinese course available at the provided link.☆10Aug 1, 2023Updated 2 years ago
- Awesome_CV的中文版本,clone本项目到overleaf即可轻松愉快编写自己的CV☆15May 24, 2024Updated last year
- ☆36Jan 13, 2026Updated last month
- Benchmark tool for comparing cassandra auto MV to manual MV☆11Aug 16, 2016Updated 9 years ago
- ATEC 蚂蚁金服 交易风险预测 Final_score:0.7494☆12Oct 19, 2018Updated 7 years ago
- BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution☆58Oct 13, 2025Updated 4 months ago
- NLPCC 2020 MAMS 多属性多情感分析任务 第一名解决方案☆12Jul 6, 2023Updated 2 years ago
- The official implemention of "Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration"☆24Feb 4, 2026Updated last month
- ☆16Jan 26, 2023Updated 3 years ago
- ☆12Feb 15, 2023Updated 3 years ago
- Daily Chinese tech digest from Karpathy’s 90 curated blogs, with AI ranking, link analysis, and a static web reader. | 基于 Karpathy 精选 90 …☆37Feb 19, 2026Updated 2 weeks ago
- [NeurIPS 2025 Spotlight] Implementation of "KLASS: KL-Guided Fast Inference in Masked Diffusion Models"☆23Jan 3, 2026Updated 2 months ago
- MLLM @ Game☆16May 12, 2025Updated 9 months ago
- The official implementation of "ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization…☆16Feb 15, 2024Updated 2 years ago
- Code repository for "RL Grokking Recipe: How RL Unlocks and Transfers New Algorithms in LLMs""☆30Oct 12, 2025Updated 4 months ago
- ☆19Jul 2, 2022Updated 3 years ago
- ☆15Jul 26, 2017Updated 8 years ago
- 逻辑思维测试题☆15Aug 29, 2020Updated 5 years ago
- Examples for several neural network types in TensorFlow☆12Aug 8, 2017Updated 8 years ago
- The data for the CRASS-benchmark☆16Oct 24, 2022Updated 3 years ago
- ☆17Jul 2, 2024Updated last year