A benchmark that challenges language models to code solutions for scientific problems
☆186Apr 6, 2026Updated last week
Alternatives and similar repositories for SciCode
Users that are interested in SciCode are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- GPQA: A Graduate-Level Google-Proof Q&A Benchmark☆484Sep 30, 2024Updated last year
- [COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆260Jul 13, 2025Updated 9 months ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆134Mar 5, 2026Updated last month
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆121Dec 10, 2024Updated last year
- Benchmarking Benchmark Leakage in Large Language Models☆60May 20, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆106Mar 6, 2025Updated last year
- The official repository of the Omni-MATH benchmark.☆93Dec 22, 2024Updated last year
- ☆14Apr 16, 2024Updated last year
- ☆13Jul 14, 2024Updated last year
- Collections of RLxLM experiments using minimal codes☆14Feb 17, 2025Updated last year
- ☆84Jan 25, 2025Updated last year
- ☆49Apr 4, 2025Updated last year
- The code and data for the paper JiuZhang3.0☆49May 26, 2024Updated last year
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆154Jul 12, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Harness for running and evaluating AI agents against RL environments☆146Updated this week
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"☆43Jul 19, 2024Updated last year
- ☆11Jan 3, 2024Updated 2 years ago
- Discovering Data-driven Hypotheses in the Wild☆137Jun 9, 2025Updated 10 months ago
- Resources for the Enigmata Project.☆81Aug 13, 2025Updated 8 months ago
- Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding.☆13Nov 19, 2024Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆64Jul 8, 2024Updated last year
- Replicating O1 inference-time scaling laws☆93Dec 1, 2024Updated last year
- ☆42Mar 26, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Securade.ai Sentinel - A monitoring and surveillance application that enables visual Q&A and video captioning for existing CCTV cameras.☆29Apr 6, 2025Updated last year
- [ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset☆113May 22, 2025Updated 10 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆147Sep 20, 2024Updated last year
- ☆334May 31, 2025Updated 10 months ago
- [NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆688Mar 16, 2025Updated last year
- NaturalProver: Grounded Mathematical Proof Generation with Language Models☆39Mar 24, 2023Updated 3 years ago
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆4,676Apr 1, 2026Updated last week
- ☆15Mar 30, 2026Updated 2 weeks ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆189May 20, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆841Jul 16, 2025Updated 8 months ago
- The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)☆16Feb 11, 2023Updated 3 years ago
- Safety-J: Evaluating Safety with Critique☆16Jul 28, 2024Updated last year
- [ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scie…☆191Jun 8, 2025Updated 10 months ago
- [COLM 2024] Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation☆15Jul 15, 2024Updated last year
- Predictive Chemistry Augmented with Text Retrieval☆25Feb 20, 2024Updated 2 years ago
- AI for Mathematics Paper List☆17Jan 14, 2025Updated last year