bobxwu/learning-from-rewards-llm-papers

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bobxwu/learning-from-rewards-llm-papers)

bobxwu / learning-from-rewards-llm-papers

A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.

☆73

Alternatives and similar repositories for learning-from-rewards-llm-papers

Users that are interested in learning-from-rewards-llm-papers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rhyang2021 / ARIA
View on GitHub
Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".
☆30Aug 9, 2025Updated 11 months ago
smallporridge / TrustworthyRAG
View on GitHub
☆16May 18, 2026Updated last month
WxxShirley / KDD2024ProCom
View on GitHub
Codes and data for KDD 2024 Research Track paper "ProCom: A Few-shot Targeted Community Detection Algorithm"
☆11Aug 15, 2024Updated last year
chen700564 / causalFSED
View on GitHub
☆16Nov 19, 2021Updated 4 years ago
yueshengbin / SMART
View on GitHub
[AAAI 2025 Oral] Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks
☆31Apr 14, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
YuyaoZhangQAQ / QCompiler
View on GitHub
This repository contains the code for the paper “Neuro-Symbolic Query Compiler”, accepted to the Findings of ACL 2025.
☆17Oct 20, 2025Updated 8 months ago
RUC-NLPIR / HiRA
View on GitHub
The code for paper: Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search [SIGIR 2026]
☆64Jul 4, 2025Updated last year
THU-KEG / VerIF
View on GitHub
[EMNLP 2025] Verification Engineering for RL in Instruction Following
☆57Mar 30, 2026Updated 3 months ago
kaistAI / Janus
View on GitHub
[NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages
☆53Aug 10, 2025Updated 11 months ago
SWE-EVO / SWE-EVO
View on GitHub
☆50May 3, 2026Updated 2 months ago
RM-R1-UIUC / RM-R1
View on GitHub
[ICLR'26] RM-R1: Unleashing the Reasoning Potential of Reward Models
☆165Jun 26, 2025Updated last year
TIGER-AI-Lab / General-Reasoner
View on GitHub
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
☆227Nov 27, 2025Updated 7 months ago
SalesforceAIResearch / UserBench
View on GitHub
☆63Jun 2, 2026Updated last month
MozerWang / AMPO
View on GitHub
[ICLR 2026] Adaptive Social Learning via Mode Policy Optimization for Language Agents
☆51Feb 2, 2026Updated 5 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
F2-Song / Weak-to-Strong-Decoding
View on GitHub
The official implementation of "Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding"
☆22Jun 26, 2025Updated last year
haon-chen / MoCa
View on GitHub
☆68Aug 14, 2025Updated 11 months ago
ibisbill / Transferability-of-LLM-Reasoning
View on GitHub
☆111Jul 6, 2026Updated last week
Adaxry / Unified_Layer_Skipping
View on GitHub
☆15Apr 11, 2024Updated 2 years ago
INK-USC / fewNER
View on GitHub
Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER (ACL 2022)
☆44Apr 7, 2022Updated 4 years ago
mlvlab / DeepVideoR1
View on GitHub
[NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"
☆35Feb 22, 2026Updated 4 months ago
haon-chen / mmE5
View on GitHub
☆59Feb 27, 2025Updated last year
UKPLab / arxiv2025-inherent-limits-plms
View on GitHub
Code repository for the paper "The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Le…
☆14Jan 16, 2025Updated last year
DeepExperience / LoopTool
View on GitHub
☆67Dec 10, 2025Updated 7 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
xlyu0106 / ViF
View on GitHub
[ICLR 26] Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
☆44Oct 3, 2025Updated 9 months ago
rhyang2021 / LoGU
View on GitHub
Source code for our paper: "LoGU: Long-form Generation with Uncertainty Expressions".
☆19May 27, 2025Updated last year
JingMog / THOR
View on GitHub
[ICLR-2026] Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".
☆33Feb 26, 2026Updated 4 months ago
52paper / 52paper.github.io
View on GitHub
☆75Sep 1, 2022Updated 3 years ago
lynnliu030 / artifact-eval
View on GitHub
☆13Apr 9, 2025Updated last year
caiqizh / LUQ
View on GitHub
☆14Jan 14, 2026Updated 6 months ago
Jacob-Zhou / gecdi
View on GitHub
The repo of "Improving Seq2Seq Grammatical Error Correction via Decoding Interventions"
☆32Jan 22, 2024Updated 2 years ago
tigerchen52 / LOVE
View on GitHub
ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost
☆41Nov 15, 2023Updated 2 years ago
duyngtr16061999 / KDMCSE
View on GitHub
☆10Apr 7, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
illidanlab / ABD
View on GitHub
[ICML2023] Revisiting Data-Free Knowledge Distillation with Poisoned Teachers
☆24Jul 7, 2024Updated 2 years ago
plageon / HierSearch
View on GitHub
HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
☆40Oct 9, 2025Updated 9 months ago
shuzhangzhong / HybriMoE-Preview
View on GitHub
☆17Apr 9, 2025Updated last year
ERC-ITEA / MuduoLLM
View on GitHub
☆82Oct 14, 2025Updated 9 months ago
qqaatw / pytorch-realm-orqa
View on GitHub
PyTorch reimplementation of REALM and ORQA
☆22Feb 3, 2022Updated 4 years ago
KaiyuanGao / AI-Conference-Papers
View on GitHub
☆19Oct 10, 2020Updated 5 years ago
rickyang1114 / multimodal-deepresearcher
View on GitHub
[AAAI 2026] Multimodal Deepresearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework
☆57Jun 8, 2026Updated last month