ezelikman / STaRLinks

Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)

☆218

Alternatives and similar repositories for STaR

Users that are interested in STaR are comparing it to the libraries listed below

Sorting:

YuxiXie / MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆327Updated last year
waterhorse1 / LLM_Tree_Search
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
☆283Updated last year
kyegomez / Lets-Verify-Step-by-Step
"Improving Mathematical Reasoning with Process Supervision" by OPENAI
☆113Updated last month
Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆180Updated 2 years ago
MARIO-Math-Reasoning / Super_MARIO
☆341Updated 5 months ago
YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆198Updated 7 months ago
xingyaoww / mint-bench
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…
☆134Updated last year
allenai / FineGrainedRLHF
☆280Updated 10 months ago
allenai / reward-bench
RewardBench: the first evaluation tool for reward models.
☆660Updated 5 months ago
OFA-Sys / gsm8k-ScRel
Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
☆267Updated last year
QwenLM / ProcessBench
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆179Updated 6 months ago
kanishkg / cognitive-behaviors
☆216Updated 8 months ago
SuperBruceJia / Awesome-LLM-Self-Consistency
Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models
☆113Updated 4 months ago
IBM / SALMON
Self-Alignment with Principle-Following Reward Models
☆169Updated 2 months ago
TIGER-AI-Lab / Program-of-Thoughts
Data and Code for Program of Thoughts [TMLR 2023]
☆292Updated last year
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆124Updated last year
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆132Updated 8 months ago
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆167Updated 8 months ago
Ber666 / ToolkenGPT
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)
☆264Updated last year
Yifan-Song793 / ETO
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆159Updated last year
eddycmu / demystify-long-cot
☆327Updated 6 months ago
da03 / implicit_chain_of_thought
☆139Updated last year
hkust-nlp / AgentBoard
An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
☆366Updated last year
rxlqn / awesome-llm-self-reflection
augmented LLM with self reflection
☆135Updated 2 years ago
voidism / DoLa
Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"
☆524Updated 10 months ago
zankner / CLoud
Critique-out-Loud Reward Models
☆70Updated last year
Linear95 / SPAG
Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024
☆141Updated 9 months ago
Cohere-Labs-Community / parameter-efficient-moe
☆272Updated 2 years ago
BrendanGraham14 / mcts-llm
☆130Updated last year
jayelm / gisting
Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467
☆300Updated 9 months ago