JLZhong23 / awesome-reward-modelsLinks
☆52Updated last week
Alternatives and similar repositories for awesome-reward-models
Users that are interested in awesome-reward-models are comparing it to the libraries listed below
Sorting:
- ☆169Updated this week
- ☆131Updated 3 weeks ago
- Awesome-Efficient-Inference-for-LRMs is a collection of state-of-the-art, novel, exciting, token-efficient methods for Large Reasoning Mo…☆65Updated this week
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆45Updated 7 months ago
- Accepted LLM Papers in NeurIPS 2024☆37Updated 7 months ago
- A Survey on the Honesty of Large Language Models☆57Updated 5 months ago
- 😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond☆237Updated this week
- ☆45Updated last month
- A comprehensive collection of process reward models.☆88Updated 2 weeks ago
- ☆33Updated last week
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆73Updated this week
- 关于LLM和Multimodal LLM的paper list☆40Updated last week
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆54Updated 6 months ago
- A research repo for experiments about Reinforcement Finetuning☆47Updated 2 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆97Updated this week
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆221Updated this week
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆69Updated 3 months ago
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆107Updated this week
- ☆42Updated 3 months ago
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆21Updated last week
- Survey on Data-centric Large Language Models☆83Updated 10 months ago
- [ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"☆56Updated 5 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆123Updated this week
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆107Updated last month
- Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning☆164Updated last year
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆120Updated 7 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆65Updated last month
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆69Updated last week
- This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language …☆55Updated 3 weeks ago
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."☆66Updated 6 months ago