Danau5tin / calculator_agent_rlLinks

Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.

☆45

Alternatives and similar repositories for calculator_agent_rl

Users that are interested in calculator_agent_rl are comparing it to the libraries listed below

Sorting:

casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 6 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆55Updated 6 months ago
brendanhogan / picoDeepResearch
☆64Updated 2 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆103Updated 4 months ago
PrimeIntellect-ai / genesys
☆130Updated 4 months ago
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 6 months ago
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆68Updated 3 months ago
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated last year
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆175Updated 4 months ago
arcee-ai / DAM
☆53Updated 8 months ago
Pleias / Quest-Best-Tokens
An introduction to LLM Sampling
☆79Updated 7 months ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆72Updated 4 months ago
xjdr-alt / llmri
look how they massacred my boy
☆63Updated 9 months ago
haizelabs / j1-micro
j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.
☆94Updated 2 weeks ago
StigLidu / DualDistill
The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"
☆84Updated 2 weeks ago
facebookresearch / collaborative-reasoner
Source code for the collaborative reasoner research project at Meta FAIR.
☆99Updated 3 months ago
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆103Updated 4 months ago
SinatrasC / entropix
Entropy Based Sampling and Parallel CoT Decoding
☆17Updated 9 months ago
axeld5 / pali_reason
Testing paligemma2 finetuning on reasoning dataset
☆18Updated 7 months ago
gkamradt / SnakeBench
☆88Updated last month
tokenbender / avataRL
rl from zero pretrain, can it be done? we'll see.
☆66Updated 2 weeks ago
facebookresearch / matrix
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆81Updated this week
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆91Updated 6 months ago
Mihaiii / backtrack_sampler
An easy-to-understand framework for LLM samplers that rewind and revise generated tokens
☆140Updated 5 months ago
allenai / infinigram-api
☆73Updated 2 weeks ago
kubernetes-bad / reward-composer
Lego for GRPO
☆28Updated 2 months ago
letta-ai / sleep-time-compute
accompanying material for sleep-time compute paper
☆99Updated 3 months ago
tyler-romero / microR1
Simple repository for training small reasoning models
☆32Updated 5 months ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆149Updated 6 months ago
xjdr-alt / muzero_sketch
☆38Updated last year