damanimehul/RLCR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/damanimehul/RLCR)

damanimehul / RLCR

Official repository for Beyond Binary Rewards: Training LMs to Reason about Their Uncertainty

☆68

Alternatives and similar repositories for RLCR

Users that are interested in RLCR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

shuoli90 / Rank-Calibration
View on GitHub
This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.
☆14Apr 9, 2024Updated 2 years ago
emmyqin / iw_sft
View on GitHub
☆28Jul 18, 2025Updated last year
dobriban / Principles-of-AI-LLMs
View on GitHub
Materials for the course Principles of AI: LLMs at UPenn (Stat 9911, Spring 2025). LLM architectures, training paradigms (pre- and post-t…
☆48Jun 14, 2025Updated last year
yasumasaonoe / ecbd
View on GitHub
☆11Apr 23, 2023Updated 3 years ago
sail-sg / AnytimeReasoner
View on GitHub
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆54Jul 15, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
princeton-pli / what-makes-good-rm
View on GitHub
[NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective
☆44Sep 18, 2025Updated 10 months ago
QingyangZhang / TEMPO
View on GitHub
Scaling Test-time Training for LLM Reasoning
☆27Apr 14, 2026Updated 3 months ago
XMUDeepLIT / MNMT
View on GitHub
Code for "Multi-Modal Neural Machine Translation with Deep Semantic Interactions" (Information Sciences)
☆16May 21, 2021Updated 5 years ago
facebookresearch / AbstentionBench
View on GitHub
A holistic benchmark for LLM abstention
☆87Aug 27, 2025Updated 11 months ago
nathanhu0 / CaMeLS
View on GitHub
Codebase for Context-aware Meta-learned Loss Scaling (CaMeLS). https://arxiv.org/abs/2305.15076.
☆26Jan 23, 2024Updated 2 years ago
falonss703 / Awesome-Uncertainty-based-Reinforcement-Learning
View on GitHub
🔥🔥🔥Latest Papers, Codes on Uncertainty-based RL
☆58Aug 24, 2025Updated 11 months ago
probabilistic-inference-scaling / probabilistic-inference-scaling
View on GitHub
☆52Mar 17, 2025Updated last year
formll / resolving-scaling-law-discrepancies
View on GitHub
☆19Nov 4, 2025Updated 8 months ago
BlueWhaleLab / COME
View on GitHub
[ICLR 2025] COME: Test-time Adaption by Conservatively Minimizing Entropy
☆23Mar 5, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
Tsinghua-dhy / EDC-2-RAG
View on GitHub
☆19Nov 3, 2025Updated 8 months ago
QingyangZhang / EMPO
View on GitHub
[NeurIPS25 Spotlight] EMPO, A Fully Unsupervised RLVR Method
☆103Nov 24, 2025Updated 8 months ago
grasp-lyrl / low-dimensional-deepnets
View on GitHub
☆19Mar 31, 2024Updated 2 years ago
YujieLu10 / IACE-NLU
View on GitHub
Official repo for "Imagination-Augmented Natural Language Understanding", NAACL 2022.
☆17Aug 30, 2022Updated 3 years ago
ZhangXJ199 / EDGE-GRPO
View on GitHub
Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity
☆22Aug 28, 2025Updated 11 months ago
XMUDeepLIT / CG-RL
View on GitHub
Code for "Exploring Dynamic Selection of Branch Expansion Orders for Code Generation" (ACL 2021)
☆31Apr 11, 2022Updated 4 years ago
hltcoe / rank-k
View on GitHub
Repository for the listwise reranker Rank-K
☆16May 23, 2025Updated last year
XMUDeepLIT / NSEG
View on GitHub
Code for “Graph based Neural Sentence Ordering” (IJCAI2019)
☆31Aug 30, 2019Updated 6 years ago
bcdnlp / PRD
View on GitHub
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
☆12Apr 21, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
launchnlp / BOLT
View on GitHub
Code for ACL 2023 paper "BOLT: Fast Energy-based Controlled Text Generation with Tunable Biases".
☆22Sep 7, 2023Updated 2 years ago
GaotangLi / Beyond-Log-Likelihood
View on GitHub
[ICML'26 Spotlight] What is the right loss function for LLM supervised finetuning?
☆66May 28, 2026Updated 2 months ago
tmlr-group / NoisyRationales
View on GitHub
[NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"
☆41Jul 18, 2025Updated last year
montehoover / DynaGuard
View on GitHub
Code for "DynaGuard: A Dynamic Guardrail Model With User-Defined Policies."
☆23Nov 3, 2025Updated 8 months ago
multimodal-art-projection / TreePO
View on GitHub
☆65Mar 30, 2026Updated 3 months ago
Qwen-Applications / GD2PO
View on GitHub
☆20Jun 16, 2026Updated last month
FreedomIntelligence / Med-MAT
View on GitHub
[ACL 2025] Exploring Compositional Generalization of Multimodal LLMs for Medical Imaging
☆40Jun 4, 2025Updated last year
MiaoXiong2320 / ProximityBias-Calibration
View on GitHub
☆19Nov 11, 2023Updated 2 years ago
ml-jku / SDLG
View on GitHub
SDLG is an efficient method to accurately estimate aleatoric semantic uncertainty in LLMs
☆28Jun 7, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
mstaib / mmd-dro-code
View on GitHub
Accompanying code for our NeurIPS 2019 paper
☆11Nov 7, 2019Updated 6 years ago
BlueWhaleLab / DCScore
View on GitHub
☆13May 23, 2025Updated last year
UCSB-NLP-Chang / ThinkPrune
View on GitHub
☆46Sep 27, 2025Updated 10 months ago
xhwang22 / Awesome-Reward-Hacking
View on GitHub
A curated list of papers and resources on Reward Hacking, Emergent Misalignment, and Proxy Exploitation in Large Models
☆43Apr 17, 2026Updated 3 months ago
leopoldwhite / Awesome-Inference-Time-Trustworthiness
View on GitHub
☆15May 15, 2026Updated 2 months ago
MingxuanZhangPurdue / multiagent-vulnerable
View on GitHub
☆17Jan 11, 2026Updated 6 months ago
zhangir-azerbayev / repl
View on GitHub
A simple REPL for Lean 4, returning information about errors and sorries.
☆12Jun 19, 2023Updated 3 years ago