keskival / recursive-self-improvement-suite
A suite of open-ended, non-imitative tasks involving generalizable skills for large language model chatbots and agents to enable bootstrapped recursive self-improvement and an unambiguous AGI.
☆32Updated last month
Alternatives and similar repositories for recursive-self-improvement-suite:
Users that are interested in recursive-self-improvement-suite are comparing it to the libraries listed below
- Functional Benchmarks and the Reasoning Gap☆84Updated 5 months ago
- Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"☆62Updated 8 months ago
- ☆39Updated 8 months ago
- Evaluation of neuro-symbolic engines☆35Updated 7 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆165Updated 2 weeks ago
- ☆50Updated 4 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated this week
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆52Updated 3 months ago
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆43Updated 4 months ago
- Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation☆41Updated last year
- ☆81Updated last year
- Interaction-first method for generating demonstrations for web-agents on any website☆31Updated 3 weeks ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆78Updated 3 weeks ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆65Updated 3 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆142Updated last month
- Evaluating LLMs with CommonGen-Lite☆89Updated last year
- ☆68Updated last year
- Advanced Reasoning Benchmark Dataset for LLMs☆45Updated last year
- ☆53Updated this week
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆39Updated 4 months ago
- Based on the tree of thoughts paper☆46Updated last year
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆103Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 6 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆75Updated 5 months ago
- This repository contains a LLM benchmark for the social deduction game `Resistance Avalon'☆101Updated last month
- ☆14Updated 2 weeks ago
- [ACL 2024] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View☆114Updated 10 months ago
- A virtual environment for developing and evaluating automated scientific discovery agents.☆136Updated 2 weeks ago
- Mixing Language Models with Self-Verification and Meta-Verification☆102Updated 3 months ago
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment☆38Updated last year