lmarena / p2lLinks
Prompt-to-Leaderboard
☆239Updated last month
Alternatives and similar repositories for p2l
Users that are interested in p2l are comparing it to the libraries listed below
Sorting:
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆219Updated last month
- ☆211Updated last month
- Code for the paper: "Learning to Reason without External Rewards"☆295Updated last week
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆406Updated 2 weeks ago
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆551Updated 3 months ago
- AWM: Agent Workflow Memory☆275Updated 4 months ago
- Scaling Data for SWE-agents☆256Updated this week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 5 months ago
- Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"☆149Updated 2 weeks ago
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆395Updated last month
- [ICML 2025 Oral] CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction☆535Updated last month
- SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning☆422Updated this week
- A simple unified framework for evaluating LLMs☆219Updated 2 months ago
- [ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction☆318Updated 3 months ago
- Atom of Thoughts for Markov LLM Test-Time Scaling☆574Updated last week
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆159Updated 2 weeks ago
- ☆317Updated 9 months ago
- ☆119Updated last month
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆461Updated 2 months ago
- Tina: Tiny Reasoning Models via LoRA☆260Updated 3 weeks ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆238Updated last month
- AN O1 REPLICATION FOR CODING☆335Updated 6 months ago
- Benchmarking Chat Assistants on Long-Term Interactive Memory (ICLR 2025)☆115Updated last month
- OS-ATLAS: A Foundation Action Model For Generalist GUI Agents☆351Updated 2 months ago
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆290Updated this week
- ReasonFlux Series - Open-Sourced LLM Family for Reasoning, Coding, Reward Modeling and Data Selection☆409Updated 2 weeks ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 3 months ago
- A benchmark for LLMs on complicated tasks in the terminal☆177Updated this week
- The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]☆254Updated 3 months ago
- Scaling RL on advanced reasoning models☆100Updated this week