He-Ren/OJBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/He-Ren/OJBench)

He-Ren / OJBench

☆32

Alternatives and similar repositories for OJBench

Users that are interested in OJBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

chenllliang / MMEvalPro
View on GitHub
[NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs
☆25Sep 26, 2024Updated last year
banksy23 / XCoder
View on GitHub
☆36Jul 7, 2025Updated last year
KbsdJames / omni-math-rule
View on GitHub
The rule-based evaluation subset and code implementation of Omni-MATH
☆28Dec 23, 2024Updated last year
KbsdJames / Omni-MATH
View on GitHub
The official repository of the Omni-MATH benchmark.
☆94Dec 22, 2024Updated last year
F2-Song / Weak-to-Strong-Decoding
View on GitHub
The official implementation of "Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding"
☆22Jun 26, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
NEUIR / COAST
View on GitHub
Official repository for the paper "COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis".
☆18Feb 19, 2025Updated last year
sylvain-wei / 24-Game-Reasoning
View on GitHub
超简单复现Deepseek-R1-Zero和Deepseek-R1，以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL，以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of Dee…
☆35Apr 5, 2025Updated last year
Asthestarsfalll / Sparse_MultiLabel_Categorical_CrossEntropy
View on GitHub
Sparse Multilabel Categorical Crossentropy
☆11Sep 10, 2023Updated 2 years ago
KbsdJames / MATH-Minos
View on GitHub
The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…
☆38Jul 25, 2024Updated 2 years ago
CONE-MT / Lego-MT
View on GitHub
☆10Mar 22, 2024Updated 2 years ago
neulab / SWE-Playground
View on GitHub
Official Repository for "Training Versatile Coding Agents in Synthetic Environments"
☆22Jan 11, 2026Updated 6 months ago
yangzhch6 / DARS
View on GitHub
The official implemention of "Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration" (ICML 2026)
☆24Feb 4, 2026Updated 5 months ago
sylvain-wei / TIME
View on GitHub
[NeurIPS 2025 D&B (Spotlight🌟)] TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenario
☆32Oct 5, 2025Updated 9 months ago
QwenLM / Self-Lengthen
View on GitHub
☆98Nov 6, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
microsoft / LEMA
View on GitHub
official repo for the paper "Learning From Mistakes Makes LLM Better Reasoner"
☆60Dec 20, 2023Updated 2 years ago
hit-mitlab / Patent-LLaMA
View on GitHub
☆24Jan 2, 2024Updated 2 years ago
KbsdJames / Awesome-LLM-Preference-Learning
View on GitHub
The official repository of our survey paper: "Towards a Unified View of Preference Learning for Large Language Models: A Survey"
☆192Oct 28, 2024Updated last year
tjake / mvbench
View on GitHub
Benchmark tool for comparing cassandra auto MV to manual MV
☆11Aug 16, 2016Updated 9 years ago
F2-Song / ICDPO
View on GitHub
The official implementation of "ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization…
☆16Feb 15, 2024Updated 2 years ago
RUCAIBox / FIGA
View on GitHub
[ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"
☆10May 5, 2024Updated 2 years ago
WenRichard / ELMO-NLP
View on GitHub
ELMO在QA问答，文本分类等NLP上面的应用
☆15Apr 13, 2019Updated 7 years ago
LCO-Embedding / LCO-Embedding
View on GitHub
[NeurIPS 2025] Scaling Language-centric Omnimodal Representation Learning
☆48Apr 13, 2026Updated 3 months ago
open-compass / CIBench
View on GitHub
Official Repo of "CIBench: Evaluation of LLMs as Code Interpreter "
☆15Jul 19, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
yulu-dada / Attention-calibration-NMT
View on GitHub
☆13May 15, 2021Updated 5 years ago
GavinZhengOI / LiveCodeBench-Pro
View on GitHub
☆176Dec 13, 2025Updated 7 months ago
agentica-project / verl-pipeline
View on GitHub
Async pipelined version of Verl
☆124Apr 8, 2025Updated last year
CLUEbenchmark / Math24o
View on GitHub
Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark
☆14Mar 27, 2025Updated last year
MiniMax-AI / SynLogic
View on GitHub
[NeurIPS 2025] The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
☆203Jul 7, 2025Updated last year
Awesome-LogCodeSecurity-LLMs / Awesome-LogCodeSecurity-LLMs
View on GitHub
☆10Nov 14, 2024Updated last year
icip-cas / awesome-auto-alignment
View on GitHub
Collection of papers for scalable automated alignment.
☆92Oct 22, 2024Updated last year
thu-coai / ComplexBench
View on GitHub
Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
☆102Feb 20, 2025Updated last year
morning9393 / ETPO
View on GitHub
☆14Mar 5, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
MrZhengXin / multi_intent_2022
View on GitHub
☆13Feb 15, 2023Updated 3 years ago
philschmid / llmperf
View on GitHub
LLMPerf is a library for validating and benchmarking LLMs
☆11Aug 13, 2024Updated last year
allenai / essential-terms
View on GitHub
"Learning What is Essential in Questions", CoNLL, 2017
☆26Aug 3, 2018Updated 7 years ago
gunchagarg / differential-learning-rate-keras
View on GitHub
Implementation of Differential Learning Rate in Keras
☆11Jun 4, 2019Updated 7 years ago
weiyifan1023 / MenatQA
View on GitHub
Code and Data for EMNLP 2023 Paper "MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Langu…
☆14Apr 7, 2025Updated last year
Yifan-Song793 / InfoCL
View on GitHub
Findings of EMNLP 2023: InfoCL: Alleviating Catastrophic Forgetting in Continual Text Classification from An Information Theoretic Perspe…
☆14Aug 13, 2024Updated last year
leolle / atec_nlp
View on GitHub
蚂蚁金融自然语言处理竞赛。
☆10Sep 3, 2018Updated 7 years ago