FrontierCS/Frontier-CS

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FrontierCS/Frontier-CS)

FrontierCS / Frontier-CS

A benchmark for evaluating LLMs on open-ended CS problems. Exploring the Next Frontier of Computer Science.

☆288

Alternatives and similar repositories for Frontier-CS

Users that are interested in Frontier-CS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

FrontierCS / FrontierSmith
View on GitHub
FrontierSmith, a new system that uses AI to synthesize open-ended coding problems at scale
☆50May 30, 2026Updated last month
skydiscover-ai / skydiscover
View on GitHub
AI-Driven Scientific and Algorithmic Discovery
☆590Jun 14, 2026Updated last month
ypwang61 / ThetaEvolve
View on GitHub
ThetaEvolve: Test-time Learning on Open Problems, enabling RL training on AlphaEvolve/OpenEvolve and emphasizing scaling test-time comput…
☆172Feb 27, 2026Updated 5 months ago
mert-cemri / autoevolve
View on GitHub
☆24Dec 6, 2025Updated 7 months ago
benchjack / benchjack
View on GitHub
AI agent benchmark hackability scanner — find evaluation vulnerabilities before they undermine your results
☆40May 25, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
gso-bench / gso
View on GitHub
[NeurIPS '25] GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents
☆90Jul 12, 2026Updated 2 weeks ago
NovaSky-AI / SkyRL
View on GitHub
SkyRL: A Modular Full-stack RL Library for LLMs
☆2,102Updated this week
skylight-org / sparse-attention-hub
View on GitHub
Advancing the frontier of efficient AI
☆67Jul 10, 2026Updated 2 weeks ago
flashinfer-ai / flashinfer-bench
View on GitHub
Building the Virtuous Cycle for AI-driven LLM Systems
☆261May 1, 2026Updated 2 months ago
StarTrail-org / RAG-DS-Serve
View on GitHub
[AAAI26]: DS SERVE: The Largest Open Vector Store over Pretain Data; A Framework for Efficient and Scalable Neural Retrieval
☆53Jan 28, 2026Updated 6 months ago
SakanaAI / ALE-Bench
View on GitHub
The official repository of ALE-Bench
☆203Jul 16, 2026Updated last week
oripress / AlgoTune
View on GitHub
AlgoTune is a NeurIPS 2025 benchmark made up of 154 math, physics, and computer science problems. The goal is write code that solves each…
☆112Jun 24, 2026Updated last month
aisa-group / PostTrainBench
View on GitHub
Measuring how well CLI agents like Claude Code or Codex CLI can post-train base LLMs on a single H100 GPU in 10 hours
☆472Jul 22, 2026Updated last week
NVlabs / ProfBench
View on GitHub
PhD/MBA-level human-annotated rubrics dataset across Physics, Chemistry, Finance and Consulting
☆32Oct 30, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
test-time-training / discover
View on GitHub
☆611May 24, 2026Updated 2 months ago
caoshiyi / K-Search
View on GitHub
Automated High-Performance GPU Kernel Generation
☆120Jun 1, 2026Updated last month
ars22 / e3
View on GitHub
☆20Sep 16, 2025Updated 10 months ago
chchenhui / mlrbench
View on GitHub
[NeurIPS 2025 D&B Track] MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research
☆33May 8, 2026Updated 2 months ago
visgym / VisGym
View on GitHub
Official Repository of VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
☆113May 3, 2026Updated 2 months ago
Infini-AI-Lab / vortex_torch
View on GitHub
Vortex: Programmable Sparse Attention for Agents as Algorithm Designers
☆67Jun 24, 2026Updated last month
Job-Bench / job-bench-eval
View on GitHub
Official eval scripts for JobBench
☆31Jul 18, 2026Updated last week
radixark / miles
View on GitHub
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
☆1,805Updated this week
Hanchenli / vllm-continuum
View on GitHub
Preview Code for Continuum Paper
☆91Jul 20, 2026Updated last week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
GAIR-NLP / InnovatorBench
View on GitHub
[ICLR 2026]InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
☆16Feb 3, 2026Updated 5 months ago
CUHK-Shenzhen-SE / UTBoost
View on GitHub
[ACL'25] UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench
☆36Aug 12, 2025Updated 11 months ago
UCB-ADRS / ADRS
View on GitHub
AI-Driven Research Systems (ADRS)
☆146Dec 17, 2025Updated 7 months ago
Human-Agent-Society / CORAL
View on GitHub
🔥🔥COLM 2026🔥🔥 CORAL is a robust, lightweight infrastructure for multi-agent autonomous self-evolution, built for autoresearch. W…
☆852Updated this week
Imbernoulli / MLS-Bench
View on GitHub
☆75Updated this week
ScalingIntelligence / KernelBench
View on GitHub
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
☆1,163Mar 24, 2026Updated 4 months ago
OpenAgentEval / SWE-ABS
View on GitHub
[ICML 2026] SWE-ABS: Adversarial Benchmark Strengthening Exposes Inflated Success Rates on Test-based Benchmark
☆22May 6, 2026Updated 2 months ago
wq-will / SimpleTES
View on GitHub
A general framework for strategically scaling evaluation-driven discovery loops, discovering state-of-the-art solutions on 21 open-ended …
☆159Updated this week
RobustNLP / TestNER
View on GitHub
A toolkit for testing and improving named entity recognition [ESEC/FSE'23]
☆11Aug 31, 2023Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
uccl-project / mKernel
View on GitHub
mKernel: fast multi-node, multi-GPU fused kernels
☆257Jun 21, 2026Updated last month
harbor-framework / harbor
View on GitHub
Framework for evaluating and improving agents
☆3,611Updated this week
togethercomputer / ParallelKernelBench
View on GitHub
☆45Updated this week
zksha / alma
View on GitHub
ALMA (Automated meta-Learning of Memory designs for Agentic systems) is a framework that meta-learns memory designs to replace human-engi…
☆250Apr 8, 2026Updated 3 months ago
pgasawa / continual-learning-bench
View on GitHub
Continual Learning Bench
☆189Jul 19, 2026Updated last week
MaoZiming / papers
View on GitHub
Paper-reading notes for Berkeley OS prelim exam.
☆14Aug 28, 2024Updated last year
algorithmicsuperintelligence / openevolve
View on GitHub
Open-source implementation of AlphaEvolve
☆6,815Jul 18, 2026Updated last week