tajwarfahim / paprika
Official Code Release for "Training a Generally Curious Agent"
☆19Updated 3 weeks ago
Alternatives and similar repositories for paprika:
Users that are interested in paprika are comparing it to the libraries listed below
- ☆20Updated 3 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 6 months ago
- ☆48Updated 4 months ago
- A repository for research on medium sized language models.☆76Updated 10 months ago
- ☆30Updated 2 months ago
- Exploration of automated dataset selection approaches at large scales.☆33Updated 3 weeks ago
- ☆16Updated 3 weeks ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆38Updated 5 months ago
- A testbed for agents and environments that can automatically improve models through data generation.☆21Updated 3 weeks ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆23Updated last week
- ☆27Updated last week
- Lottery Ticket Adaptation☆38Updated 4 months ago
- ☆24Updated 6 months ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆75Updated 2 weeks ago
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆28Updated last month
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆71Updated 7 months ago
- ☆32Updated 9 months ago
- ☆43Updated last year
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆26Updated 2 weeks ago
- Implementation of "SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models"☆26Updated last month
- ReBase: Training Task Experts through Retrieval Based Distillation☆28Updated last month
- ☆111Updated last month
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆48Updated last month
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆32Updated 4 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆83Updated last week
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆37Updated last month
- ☆25Updated 11 months ago
- ☆60Updated last month
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆80Updated last month