forecastingresearch / forecastbench-datasetsLinks
Forecastbench Datasets, updated nightly
β12Updated this week
Alternatives and similar repositories for forecastbench-datasets
Users that are interested in forecastbench-datasets are comparing it to the libraries listed below
Sorting:
- ππ§ Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!β52Updated last month
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"β59Updated 8 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code youβ¦β76Updated 7 months ago
- β43Updated 9 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive argumentsβ87Updated 10 months ago
- A framework for pitting LLMs against each other in an evolving library of games ββ32Updated 3 months ago
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM datasetβ17Updated last year
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)β91Updated 6 months ago
- This repository contains popular code generation frameworks such as MapCoder, CodeSIM.β56Updated last month
- β54Updated last month
- β100Updated 2 months ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".β69Updated last year
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)β83Updated 4 months ago
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoningβ61Updated last month
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"β120Updated 9 months ago
- β76Updated this week
- accompanying material for sleep-time compute paperβ102Updated 3 months ago
- β27Updated last year
- Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generationβ44Updated last year
- Official repo for Learning to Reason for Long-Form Story Generationβ68Updated 3 months ago
- β13Updated 3 months ago
- OneEdit: A Neural-Symbolic Collaboratively Knowledge Editing System.β19Updated 9 months ago
- β18Updated last month
- Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"β68Updated last year
- The Library for LLM-based multi-agent applicationsβ92Updated 3 weeks ago
- Code and Data for "Language Modeling with Editable External Knowledge"β34Updated last year
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discoveryβ97Updated 2 months ago
- Approximating the joint distribution of language models via MCTSβ21Updated 9 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Modelsβ¦β36Updated last year
- β66Updated 4 months ago