chr26195 / PENCILLinks
This is the official implementation for paper "PENCIL: Long Thoughts with Short Memory".
☆45Updated 3 weeks ago
Alternatives and similar repositories for PENCIL
Users that are interested in PENCIL are comparing it to the libraries listed below
Sorting:
- Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆96Updated last month
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆104Updated 4 months ago
- ☆114Updated 4 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆89Updated last week
- Natural Language Reinforcement Learning☆89Updated 5 months ago
- FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models☆49Updated last week
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 6 months ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆91Updated 2 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆104Updated 2 months ago
- MARFT stands for Multi-Agent Reinforcement Fine-Tuning. This repository implements an LLM-based multi-agent reinforcement fine-tuning fra…☆35Updated 2 weeks ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆59Updated 4 months ago
- ☆104Updated last month
- The official implementation of Self-Exploring Language Models (SELM)☆64Updated last year
- Repo for "Z1: Efficient Test-time Scaling with Code"☆59Updated last month
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆107Updated 2 weeks ago
- ☆45Updated 3 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆149Updated 2 months ago
- Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)☆54Updated 9 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆95Updated 2 months ago
- ☆231Updated last week
- ☆49Updated 3 weeks ago
- Reinforcing General Reasoning without Verifiers☆51Updated last week
- Verifiers for LLM Reinforcement Learning☆56Updated last month
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆56Updated 3 months ago
- ☆93Updated 8 months ago
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆42Updated last week
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆82Updated 2 weeks ago
- ☆32Updated 3 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆208Updated last month
- A Sober Look at Language Model Reasoning☆63Updated last week