siegelz / core-benchLinks
☆40Updated 3 months ago
Alternatives and similar repositories for core-bench
Users that are interested in core-bench are comparing it to the libraries listed below
Sorting:
- ☆92Updated 3 weeks ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 3 months ago
- Functional Benchmarks and the Reasoning Gap☆87Updated 8 months ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆89Updated 2 weeks ago
- Discovering Data-driven Hypotheses in the Wild☆94Updated 2 weeks ago
- A virtual environment for developing and evaluating automated scientific discovery agents.☆161Updated 3 months ago
- A benchmark that challenges language models to code solutions for scientific problems☆123Updated this week
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆89Updated 6 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆82Updated 8 months ago
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 9 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆91Updated 2 months ago
- ☆32Updated last month
- Train your own SOTA deductive reasoning model☆94Updated 3 months ago
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆62Updated last month
- PyTorch library for Active Fine-Tuning☆80Updated 4 months ago
- ☆87Updated 2 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- AWM: Agent Workflow Memory☆279Updated 4 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆206Updated 3 weeks ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆184Updated this week
- ☆52Updated 2 weeks ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated 2 months ago
- This repository contains ScholarQABench data and evaluation pipeline.☆72Updated 2 months ago
- Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".☆172Updated this week
- Official Code Repository for the paper "Distilling LLM Agent into Small Models with Retrieval and Code Tools"☆109Updated 3 weeks ago
- Evaluating LLMs with fewer examples☆158Updated last year
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆37Updated 5 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆95Updated 2 weeks ago
- ☆61Updated 3 weeks ago
- CodeScientist: An automated scientific discovery system for code-based experiments☆273Updated this week