sail-sg / feedback-conditional-policyLinks

Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"

☆53

Alternatives and similar repositories for feedback-conditional-policy

Users that are interested in feedback-conditional-policy are comparing it to the libraries listed below

Sorting:

sail-sg / AnytimeReasoner
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆48Updated 4 months ago
UCSB-NLP-Chang / ThinkPrune
☆45Updated 2 months ago
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆69Updated 4 months ago
multimodal-art-projection / TreePO
☆50Updated last month
test-time-interaction / TTI
☆65Updated 5 months ago
sail-sg / VeriFree
Reinforcing General Reasoning without Verifiers
☆92Updated 5 months ago
TIGER-AI-Lab / Hierarchical-Reasoner
Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning
☆49Updated last month
LAMDASZ-ML / Self-Backtracking
☆51Updated 9 months ago
zhijie-group / SIFT
SIFT: Grounding LLM Reasoning in Contexts via Stickers
☆57Updated 8 months ago
hkust-nlp / B-STaR
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆86Updated 6 months ago
zjunlp / LightThinker
[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
☆123Updated 7 months ago
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 7 months ago
yuleiqin / RAIF
A Recipe for Building LLM Reasoners to Solve Complex Instructions
☆29Updated last month
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆89Updated 2 weeks ago
ryoungj / BoLT
Code for "Reasoning to Learn from Latent Thoughts"
☆122Updated 8 months ago
uservan / ThinkPO
☆17Updated 4 months ago
fangyuan-ksgk / CoT-Reasoning-without-Prompting
Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting
☆34Updated last year
NuoJohnChen / JudgeLRM
JudgeLRM: Large Reasoning Models as a Judge
☆40Updated 2 months ago
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆83Updated 8 months ago
mathllm / Step-Controlled_DPO
☆23Updated last year
sail-sg / variational-reasoning
Code for "Variational Reasoning for Language Models"
☆52Updated 2 months ago
MingLiiii / Layer_Gradient
[ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
☆74Updated 5 months ago
jinzhuoran / RAG-RewardBench
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆16Updated 11 months ago
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆105Updated last month
NineAbyss / S2R
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
☆72Updated 7 months ago
mandyyyyii / east
☆20Updated 4 months ago
yayayacc / MUR
☆46Updated 2 months ago
Optimization-AI / DisCO
Discriminative Constrained Optimization for Reinforcing Large Reasoning Models
☆46Updated 3 weeks ago
YujunZhou / EVOL-RL
Code for Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation (EVOL-RL).
☆41Updated last month
luka-group / vlm-knowledge-conflict
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆49Updated last year