keskival / recursive-self-improvement-suite
A suite of open-ended, non-imitative tasks involving generalizable skills for large language model chatbots and agents to enable bootstrapped recursive self-improvement and an unambiguous AGI.
☆31Updated last month
Alternatives and similar repositories for recursive-self-improvement-suite:
Users that are interested in recursive-self-improvement-suite are comparing it to the libraries listed below
- Evaluation of neuro-symbolic engines☆34Updated 5 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆154Updated 2 months ago
- ☆47Updated last month
- ☆81Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆119Updated 5 months ago
- Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation☆38Updated last year
- Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"☆57Updated 6 months ago
- A virtual environment for developing and evaluating automated scientific discovery agents.☆117Updated last week
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆114Updated 7 months ago
- Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)☆48Updated 5 months ago
- ☆38Updated 5 months ago
- A benchmark that challenges language models to code solutions for scientific problems☆97Updated this week
- Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"☆73Updated last year
- Can Language Models Solve Olympiad Programming?☆108Updated this week
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆77Updated 3 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆68Updated last month
- Functional Benchmarks and the Reasoning Gap☆82Updated 3 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆68Updated 3 months ago
- Based on the tree of thoughts paper☆46Updated last year
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆39Updated 2 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆62Updated last month
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆134Updated last month
- ☆46Updated 2 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆55Updated last month
- Official Implementation of "DeLLMa: Decision Making Under Uncertainty with Large Language Models"☆42Updated 2 months ago
- Fun project to run your own LLM chat bot using llama.cpp☆11Updated last year
- ☆116Updated this week
- ☆140Updated 8 months ago
- Official homepage for "Self-Harmonized Chain of Thought"☆88Updated last month
- Understanding the correlation between different LLM benchmarks☆29Updated last year