aorwall / moatless-tree-searchLinks
☆86Updated 2 weeks ago
Alternatives and similar repositories for moatless-tree-search
Users that are interested in moatless-tree-search are comparing it to the libraries listed below
Sorting:
- ☆97Updated last month
- Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆76Updated 2 weeks ago
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆63Updated 8 months ago
- ☆41Updated 4 months ago
- Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models☆22Updated 2 months ago
- ☆40Updated 11 months ago
- Scaling Data for SWE-agents☆256Updated this week
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆90Updated last month
- RepoQA: Evaluating Long-Context Code Understanding☆109Updated 7 months ago
- accompanying material for sleep-time compute paper☆93Updated last month
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆12Updated 2 months ago
- [ACL25' Findings] SWE-Dev is an SWE agent with a scalable test case construction pipeline.☆40Updated last week
- SWE Arena☆34Updated 2 months ago
- ☆36Updated last month
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆93Updated 2 weeks ago
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆55Updated 4 months ago
- [FORGE 2025] Graph-based method for end-to-end code completion with context awareness on repository☆63Updated 9 months ago
- A Comprehensive Benchmark for Software Development.☆108Updated last year
- DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents☆82Updated this week
- ☆46Updated last year
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆62Updated 3 weeks ago
- 🚀 SWE-bench Goes Live!☆65Updated last week
- Replicating O1 inference-time scaling laws☆87Updated 6 months ago
- ☆158Updated 9 months ago
- ☆115Updated 4 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆57Updated 6 months ago
- [NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?☆125Updated 9 months ago
- ☆97Updated 11 months ago
- Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement☆100Updated 4 months ago
- ☆43Updated 2 months ago