RUCKBReasoning / CodeRMLinks
Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'
☆23Updated 3 months ago
Alternatives and similar repositories for CodeRM
Users that are interested in CodeRM are comparing it to the libraries listed below
Sorting:
- ☆22Updated last year
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 8 months ago
- ☆45Updated last month
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆86Updated 4 months ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆30Updated 3 weeks ago
- ☆47Updated 6 months ago
- Codebase for Instruction Following without Instruction Tuning☆35Updated 11 months ago
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆28Updated 8 months ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆38Updated last year
- JudgeLRM: Large Reasoning Models as a Judge☆32Updated 4 months ago
- ☆40Updated 4 months ago
- ☆18Updated 3 weeks ago
- Repo for "Z1: Efficient Test-time Scaling with Code"☆64Updated 4 months ago
- Code for paper: Long cOntext aliGnment via efficient preference Optimization☆13Updated 6 months ago
- ReasonFlux-Coder: Open-Source LLM Coders with Co-Evolving Reinforcement Learning☆108Updated last month
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆77Updated 5 months ago
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆21Updated 3 weeks ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆126Updated 2 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆108Updated 3 months ago
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆58Updated 5 months ago
- ☆28Updated 10 months ago
- Process Reward Models That Think☆48Updated last month
- ☆26Updated 4 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆69Updated 4 months ago
- ☆127Updated last week
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated 4 months ago
- Official code implementation for the ACL 2025 paper: 'CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis'☆26Updated 3 months ago
- ☆56Updated 2 months ago
- [ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style☆59Updated last month
- ☆49Updated 10 months ago