A benchmark that challenges language models to code solutions for scientific problems
☆180Mar 16, 2026Updated last week
Alternatives and similar repositories for SciCode
Users that are interested in SciCode are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- GPQA: A Graduate-Level Google-Proof Q&A Benchmark☆477Sep 30, 2024Updated last year
- [COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆254Jul 13, 2025Updated 8 months ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆132Mar 5, 2026Updated 2 weeks ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆121Dec 10, 2024Updated last year
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆107Mar 6, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Benchmarking Benchmark Leakage in Large Language Models☆60May 20, 2024Updated last year
- The official repository of the Omni-MATH benchmark.☆93Dec 22, 2024Updated last year
- ☆14Apr 16, 2024Updated last year
- ☆13Jul 14, 2024Updated last year
- Collections of RLxLM experiments using minimal codes☆14Feb 17, 2025Updated last year
- ☆85Jan 25, 2025Updated last year
- ☆49Apr 4, 2025Updated 11 months ago
- The code and data for the paper JiuZhang3.0☆49May 26, 2024Updated last year
- Harness for running and evaluating AI agents against RL environments☆135Mar 6, 2026Updated 2 weeks ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆152Jul 12, 2024Updated last year
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"☆43Jul 19, 2024Updated last year
- ☆11Jan 3, 2024Updated 2 years ago
- Discovering Data-driven Hypotheses in the Wild☆136Jun 9, 2025Updated 9 months ago
- Resources for the Enigmata Project.☆80Aug 13, 2025Updated 7 months ago
- Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding.☆13Nov 19, 2024Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆64Jul 8, 2024Updated last year
- ☆42Mar 26, 2025Updated 11 months ago
- Replicating O1 inference-time scaling laws☆93Dec 1, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Securade.ai Sentinel - A monitoring and surveillance application that enables visual Q&A and video captioning for existing CCTV cameras.☆27Apr 6, 2025Updated 11 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆147Sep 20, 2024Updated last year
- [ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset☆112May 22, 2025Updated 10 months ago
- ☆334May 31, 2025Updated 9 months ago
- NaturalProver: Grounded Mathematical Proof Generation with Language Models☆39Mar 24, 2023Updated 3 years ago
- ☆15Mar 5, 2026Updated 2 weeks ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆187May 20, 2025Updated 10 months ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆823Jul 16, 2025Updated 8 months ago
- The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)☆16Feb 11, 2023Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆4,527Updated this week
- [ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scie…☆186Jun 8, 2025Updated 9 months ago
- Safety-J: Evaluating Safety with Critique☆16Jul 28, 2024Updated last year
- [COLM 2024] Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation☆15Jul 15, 2024Updated last year
- [NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆679Mar 16, 2025Updated last year
- Predictive Chemistry Augmented with Text Retrieval☆25Feb 20, 2024Updated 2 years ago
- AI for Mathematics Paper List☆17Jan 14, 2025Updated last year