siegelz / core-benchLinks
☆39Updated 3 months ago
Alternatives and similar repositories for core-bench
Users that are interested in core-bench are comparing it to the libraries listed below
Sorting:
- ☆89Updated this week
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 3 months ago
- A virtual environment for developing and evaluating automated scientific discovery agents.☆156Updated 2 months ago
- SWE Arena☆33Updated last month
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆87Updated last month
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆173Updated this week
- Train your own SOTA deductive reasoning model☆93Updated 3 months ago
- Reproducible, flexible LLM evaluations☆204Updated 3 weeks ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆201Updated last month
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆54Updated 3 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆89Updated 6 months ago
- AWM: Agent Workflow Memory☆271Updated 4 months ago
- Complex Function Calling Benchmark.☆112Updated 4 months ago
- A simple unified framework for evaluating LLMs☆215Updated last month
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆75Updated 9 months ago
- PyTorch library for Active Fine-Tuning☆79Updated 3 months ago
- A benchmark that challenges language models to code solutions for scientific problems☆123Updated this week
- Source code for the collaborative reasoner research project at Meta FAIR.☆87Updated last month
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆81Updated 8 months ago
- ☆205Updated 3 months ago
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 8 months ago
- Official Implementation of "Reasoning Language Models: A Blueprint"☆62Updated 3 months ago
- ☆30Updated 3 weeks ago
- ⚖️ Awesome LLM Judges ⚖️☆104Updated last month
- This repository contains ScholarQABench data and evaluation pipeline.☆72Updated last month
- Code for the paper 🌳 Tree Search for Language Model Agents☆200Updated 10 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆235Updated 8 months ago
- Evaluating LLMs with fewer examples☆156Updated last year
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆41Updated 7 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆97Updated 7 months ago