forecastingresearch / forecastbench-datasetsLinks
Forecastbench Datasets, updated nightly
☆11Updated this week
Alternatives and similar repositories for forecastbench-datasets
Users that are interested in forecastbench-datasets are comparing it to the libraries listed below
Sorting:
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated 2 months ago
- ☆26Updated 11 months ago
- ☆10Updated 2 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 5 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- OneEdit: A Neural-Symbolic Collaboratively Knowledge Editing System.☆18Updated 8 months ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆37Updated 6 months ago
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆16Updated last year
- Agent Skill Induction: "Inducing Programmatic Skills for Agentic Tasks"☆24Updated 2 months ago
- ☆65Updated 3 months ago
- Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"☆65Updated 11 months ago
- ☆57Updated 9 months ago
- ☆15Updated 8 months ago
- [ACL 2025] Agentic Knowledgeable Self-awareness☆72Updated 2 weeks ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆34Updated last year
- Source code and utilities for the Genesys distributed language model architecture discovery system.☆28Updated this week
- Code and Data for "Language Modeling with Editable External Knowledge"☆33Updated last year
- ☆50Updated last month
- ☆20Updated 3 months ago
- ☆11Updated 2 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆54Updated last year
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆26Updated 6 months ago
- Official repo for Learning to Reason for Long-Form Story Generation☆63Updated 2 months ago
- ☆19Updated last month
- [ICLR 2025] ELICIT: LLM Augmentation Via External In-context Capability☆11Updated 3 months ago
- An attribution library for LLMs☆41Updated 9 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆58Updated 6 months ago
- Official Implementation of UA^{2}-Agent and other baseline algorithms of "Towards Unified Alignment Between Agents, Humans, and Environme…☆18Updated 7 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆53Updated 3 months ago
- ☆24Updated 5 months ago