modelscope / OpenJudgeLinks

OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards

☆117

Alternatives and similar repositories for OpenJudge

Users that are interested in OpenJudge are comparing it to the libraries listed below

Sorting:

qiancheng0 / ToolRL
☆404Updated 2 months ago
modelscope / Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…
☆464Updated this week
a-m-team / a-m-models
a-m-team's exploration in large language modeling
☆195Updated 7 months ago
pengr / LLM-Synthetic-Data
A live reading list for LLM data synthesis (Updated to July, 2025).
☆434Updated 4 months ago
wjn1996 / Awesome-LLM-Reasoning-Openai-o1-Survey
The related works and background techniques about Openai o1
☆221Updated last year
OFA-Sys / InsTag
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
☆283Updated 2 years ago
chenchen0103 / ACEBench
☆153Updated 2 months ago
ADaM-BJTU / OpenRFT
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
☆154Updated last year
Mryangkaitong / deepseek-r1-gsm8k
☆47Updated 10 months ago
tianyi-lab / Cherry_LLM
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…
☆412Updated 6 months ago
HarderThenHarder / RLLoggingBoard
A visuailzation tool to make deep understaning and easier debugging for RLHF training.
☆275Updated 10 months ago
bytarnish / AGILE
☆161Updated 11 months ago
pldlgb / nuggets
☆87Updated 2 years ago
RUC-NLPIR / Tool-Star
🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning
☆300Updated 2 months ago
GAIR-NLP / ToRL
☆325Updated 7 months ago
PALIN2018 / BrowseComp-ZH
☆132Updated 7 months ago
RyanLiu112 / GenPRM
[AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆92Updated last month
xbench-ai / xbench-evals
Evergreen, contamination-free, real-world, domain-specific AI evaluation framework
☆114Updated 2 months ago
QwenLM / AutoIF
☆318Updated last year
GAIR-NLP / cognition-engineering
Generative AI Act II: Test Time Scaling Drives Cognition Engineering
☆209Updated 8 months ago
RUCAIBox / SimpleDeepSearcher
SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis
☆114Updated 7 months ago
LivingFutureLab / ChineseSimpleQA
☆77Updated 11 months ago
sail-sg / sdft
[ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".
☆139Updated last year
CASIA-LM / MoDS
☆147Updated last year
LCLM-Horizon / A-Comprehensive-Survey-For-Long-Context-Language-Modeling
A Comprehensive Survey on Long Context Language Modeling
☆216Updated last month
nuochenpku / Awesome-Role-Play-Papers
Awesome papers for role-playing with language models
☆216Updated last year
ADaM-BJTU / AutoCoA
AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reaso…
☆131Updated 9 months ago
SkyworkAI / Skywork-Reward-V2
Scaling Preference Data Curation via Human-AI Synergy
☆135Updated 6 months ago
SuperGPQA / SuperGPQA
☆178Updated 8 months ago
IAAR-Shanghai / xVerify
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
☆143Updated last month