microsoft / stop
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
☆29Updated 10 months ago
Related projects ⓘ
Alternatives and complementary repositories for stop
- ☆112Updated last month
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆75Updated last month
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆162Updated last month
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆87Updated last year
- Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement☆47Updated 3 weeks ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆128Updated last month
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆48Updated 7 months ago
- ☆78Updated 11 months ago
- A benchmark that challenges language models to code solutions for scientific problems☆87Updated this week
- Repository for the paper Stream of Search: Learning to Search in Language☆93Updated 3 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆62Updated 5 months ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)☆42Updated 3 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆26Updated last year
- ☆42Updated 4 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆41Updated last month
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- Advanced Reasoning Benchmark Dataset for LLMs☆45Updated last year
- ☆38Updated 4 months ago
- StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback☆56Updated 2 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆47Updated last month
- Based on the tree of thoughts paper☆45Updated last year
- CodeUltraFeedback: aligning large language models to coding preferences☆65Updated 4 months ago
- ☆41Updated 2 weeks ago
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆46Updated last month
- Can Language Models Solve Olympiad Programming?☆101Updated 3 months ago
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆35Updated last month
- ☆74Updated 3 weeks ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆111Updated last month
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆57Updated last month