Linear95 / SPAGLinks

Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024

☆137

Alternatives and similar repositories for SPAG

Users that are interested in SPAG are comparing it to the libraries listed below

Sorting:

architsharma97 / dpo-rlaif
☆99Updated last year
IBM / SALMON
Self-Alignment with Principle-Following Reward Models
☆162Updated 2 months ago
openai / safety-rbr-code-and-data
Code and example data for the paper: Rule Based Rewards for Language Model Safety
☆190Updated last year
TIGER-AI-Lab / CritiqueFineTuning
Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]
☆169Updated 3 weeks ago
vwxyzjn / summarize_from_feedback_details
☆147Updated 8 months ago
hkust-nlp / llm-compression-intelligence
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆139Updated 10 months ago
WindyLee0822 / Process_Q_Model
official implementation of paper "Process Reward Model with Q-value Rankings"
☆60Updated 6 months ago
THUDM / T1
RL Scaling and Test-Time Scaling (ICML'25)
☆109Updated 6 months ago
da03 / Internalize_CoT_Step_by_Step
☆187Updated 3 months ago
Linear95 / APO
Code for ACL2024 paper - Adversarial Preference Optimization (APO).
☆56Updated last year
jwhj / OREO
☆114Updated 6 months ago
da03 / implicit_chain_of_thought
☆135Updated 8 months ago
hkust-nlp / B-STaR
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆82Updated 2 months ago
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75Updated 2 months ago
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated 10 months ago
SalesforceAIResearch / LaTRO
☆118Updated 5 months ago
Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆168Updated last year
Berkeley-NLP / Agent-Eval-Refine
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆139Updated 8 months ago
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆103Updated last week
allenai / WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
☆233Updated 9 months ago
ScalingIntelligence / large_language_monkeys
☆101Updated 10 months ago
kyegomez / Lets-Verify-Step-by-Step
"Improving Mathematical Reasoning with Process Supervision" by OPENAI
☆112Updated 2 weeks ago
JacobPfau / fillerTokens
☆67Updated last year
WildEval / ZeroEval
A simple unified framework for evaluating LLMs
☆235Updated 3 months ago
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆167Updated 2 months ago
hughbzhang / o1_inference_scaling_laws
Replicating O1 inference-time scaling laws
☆89Updated 8 months ago
joeljang / RLPHF
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
☆108Updated last year
zitian-gao / SC-MCTS
Interpretable Contrastive Monte Carlo Tree Search Reasoning
☆48Updated 8 months ago
waterhorse1 / LLM_Tree_Search
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
☆278Updated last year
YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆184Updated 3 months ago