scicode-bench / SciCode
A benchmark that challenges language models to code solutions for scientific problems
☆69Updated this week
Related projects: ⓘ
- Official implementation for <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>, accepted by ACL 2024.☆32Updated 3 months ago
- Can Language Models Solve Olympiad Programming?☆92Updated last month
- ☆100Updated 2 months ago
- A Retrieval Benchmark for Scientific Literature Search☆53Updated 2 months ago
- ☆105Updated this week
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆182Updated 4 months ago
- Repository for paper Tools Are Instrumental for Language Agents in Complex Environments☆32Updated 8 months ago
- ☆29Updated 3 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web"☆106Updated this week
- ☆74Updated this week
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆68Updated last year
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆52Updated last month
- Discovering Data-driven Hypotheses in the Wild☆31Updated 2 weeks ago
- ☆42Updated 2 months ago
- A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆106Updated 5 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆81Updated last month
- CodeUltraFeedback: aligning large language models to coding preferences☆62Updated 2 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆39Updated 7 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆73Updated 2 months ago
- Code for the paper 🌳 Tree Search for Language Model Agents☆124Updated last month
- A MAD laboratory to improve AI architecture designs 🧪☆84Updated 4 months ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆151Updated 4 months ago
- Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)☆27Updated last month
- Codebase accompanying the Summary of a Haystack paper.☆65Updated 2 months ago
- Code for the paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆140Updated 2 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆70Updated last month
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆81Updated 2 weeks ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆46Updated 5 months ago
- Experiments for efforts to train a new and improved t5☆76Updated 5 months ago
- Code and Data for Tau-Bench☆91Updated this week