tajwarfahim / paprikaLinks
Official Code Release for "Training a Generally Curious Agent"
☆25Updated last month
Alternatives and similar repositories for paprika
Users that are interested in paprika are comparing it to the libraries listed below
Sorting:
- Verifiers for LLM Reinforcement Learning☆60Updated 2 months ago
- ☆20Updated this week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆35Updated last week
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆17Updated last month
- ☆48Updated last week
- Simple repository for training small reasoning models☆31Updated 4 months ago
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆42Updated this week
- ☆32Updated last month
- ☆51Updated 7 months ago
- ☆24Updated 9 months ago
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆45Updated 2 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆91Updated 2 months ago
- ☆115Updated 4 months ago
- ☆32Updated 5 months ago
- ☆21Updated 6 months ago
- ☆50Updated 3 weeks ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆53Updated 4 months ago
- accompanying material for sleep-time compute paper☆93Updated last month
- ☆18Updated 2 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆68Updated 3 months ago
- Reinforcing General Reasoning without Verifiers☆60Updated last week
- ☆60Updated 2 weeks ago
- Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF☆19Updated 8 months ago
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆27Updated 2 months ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆39Updated 7 months ago
- A repository for research on medium sized language models.☆76Updated last year
- ☆20Updated 3 weeks ago
- ☆47Updated 3 weeks ago
- ☆27Updated 2 weeks ago