A benchmark for evaluating LLMs on open-ended CS problems. Exploring the Next Frontier of Computer Science.
☆211May 13, 2026Updated last week
Alternatives and similar repositories for Frontier-CS
Users that are interested in Frontier-CS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ThetaEvolve: Test-time Learning on Open Problems, enabling RL training on AlphaEvolve/OpenEvolve and emphasizing scaling test-time comput…☆158Feb 27, 2026Updated 2 months ago
- The official repository of ALE-Bench☆182Updated this week
- Preview Code for Continuum Paper☆77Apr 13, 2026Updated last month
- An efficient hierarchical Graph-based RAG☆40Nov 27, 2025Updated 5 months ago
- Awesome AI Benchmarks☆32Jan 16, 2026Updated 4 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- An Autonomous Curriculum Reinforcement Learning framework that steers agents to continually learn in specific environments with zero huma…☆34May 13, 2026Updated last week
- Repository for "Training Language Models To Explain Their Own Computations"☆22Dec 22, 2025Updated 5 months ago
- Crawl & visualize ICLR papers and reviews.☆18Nov 5, 2022Updated 3 years ago
- Class materials, homeworks and videos for probation preparation.☆24Feb 3, 2026Updated 3 months ago
- Experiments on using ChatGPT for failure mode classification☆12Sep 20, 2023Updated 2 years ago
- [TMLR 2025 & ICLR 2025 DeLTa] Official Implementation of Design Editing for Offline Model-based Optimization 🧬 🤖☆10Apr 17, 2025Updated last year
- [NAACL 2024] A Synthetic, Scalable and Systematic Evaluation Suite for Large Language Models☆33Jun 10, 2024Updated last year
- ☆20May 14, 2025Updated last year
- Official implementation of NeurIPS'24 Spotlight paper "Monte Carlo Tree Search based Space Transfer for Black-box Optimization".☆13Nov 28, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation.☆137Feb 10, 2026Updated 3 months ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]☆68Oct 2, 2025Updated 7 months ago
- FailureSensorIQ, a dataset and benchmark to probe LLMs’ reasoning and comprehension of sensor–failure relationships in industrial systems…☆43Updated this week
- ☆14Nov 2, 2022Updated 3 years ago
- Codes for the paper "Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding" (ACL-IJCNLP 2021)☆41Jun 7, 2021Updated 4 years ago
- [AAAI'25] The implementation of paper "Federated Foundation Models on Heterogeneous Time Series" | The first work to explore time series …☆23May 10, 2026Updated 2 weeks ago
- Accelerating MoE with IO and Tile-aware Optimizations☆691May 14, 2026Updated last week
- Baselines for Model-Based Optimization installation fixes and compatible with newer AMPERE+ GPUs (e.g. 3090)☆11Apr 30, 2023Updated 3 years ago
- AI-Driven Research Systems (ADRS)☆142Dec 17, 2025Updated 5 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Big Sur+i3-10100+CVN B460i Gaming V20☆12Nov 13, 2020Updated 5 years ago
- Code for the paper "Bounce: Reliable High-Dimensional Bayesian Optimization for Combinatorial and Mixed Spaces"☆16Apr 30, 2024Updated 2 years ago
- Explaining neural decisions contrastively to alternative decisions.☆24Mar 18, 2021Updated 5 years ago
- [ICLR 2026] The official code for "Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models"☆26Feb 7, 2026Updated 3 months ago
- ☆19Mar 31, 2024Updated 2 years ago
- ACPBench: Reasoning about Action, Change, and Planning. A benchmark designed to evaluate the fundamental reasoning abilities in the dom…☆33Feb 11, 2026Updated 3 months ago
- Efficient and readable change point detection package implemented in Python. (Singular Spectrum Transformation - SST, IKA-SST, ulSIF, RuL…☆35May 12, 2026Updated last week
- [COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoni…☆15Oct 31, 2025Updated 6 months ago
- ☆19Jul 17, 2019Updated 6 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- [ICLR 2026] GRAPE: Group Representational Position Encoding (https://arxiv.org/abs/2512.07805)☆91May 13, 2026Updated last week
- Repository of <FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models>☆76Jan 8, 2026Updated 4 months ago
- The repo of "BugLens"☆41Nov 12, 2025Updated 6 months ago
- Initial commit☆13Aug 14, 2023Updated 2 years ago
- ☆13Dec 12, 2025Updated 5 months ago
- Example Jop and Rop attack at Arm aarch64 platform☆10Sep 8, 2020Updated 5 years ago
- AlgoTune is a NeurIPS 2025 benchmark made up of 154 math, physics, and computer science problems. The goal is write code that solves each…☆98Mar 12, 2026Updated 2 months ago