A benchmark for evaluating LLMs on open-ended CS problems. Exploring the Next Frontier of Computer Science.
☆179Apr 1, 2026Updated last week
Alternatives and similar repositories for Frontier-CS
Users that are interested in Frontier-CS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official repository of ALE-Bench☆175Mar 31, 2026Updated last week
- MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs☆41Mar 13, 2026Updated 3 weeks ago
- An efficient hierarchical Graph-based RAG☆35Nov 27, 2025Updated 4 months ago
- Awesome AI Benchmarks☆28Jan 16, 2026Updated 2 months ago
- Crawl & visualize ICLR papers and reviews.☆18Nov 5, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Experiments on using ChatGPT for failure mode classification☆12Sep 20, 2023Updated 2 years ago
- ☆12Jan 25, 2026Updated 2 months ago
- [TMLR 2025 & ICLR 2025 DeLTa] Official Implementation of Design Editing for Offline Model-based Optimization 🧬 🤖☆10Apr 17, 2025Updated 11 months ago
- [NAACL 2024] A Synthetic, Scalable and Systematic Evaluation Suite for Large Language Models☆33Jun 10, 2024Updated last year
- ☆20May 14, 2025Updated 10 months ago
- ☆17Dec 11, 2024Updated last year
- ☆30Dec 23, 2025Updated 3 months ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]☆67Oct 2, 2025Updated 6 months ago
- ☆22Feb 28, 2025Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Accelerating MoE with IO and Tile-aware Optimizations☆621Apr 1, 2026Updated last week
- FailureSensorIQ, a dataset and benchmark to probe LLMs’ reasoning and comprehension of sensor–failure relationships in industrial systems…☆35Apr 3, 2026Updated last week
- [AAAI'25] The implementation of paper "Federated Foundation Models on Heterogeneous Time Series" | The first work to explore time series …☆22Feb 2, 2026Updated 2 months ago
- Codes for the paper "Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding" (ACL-IJCNLP 2021)☆41Jun 7, 2021Updated 4 years ago
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆17Feb 9, 2026Updated 2 months ago
- Baselines for Model-Based Optimization installation fixes and compatible with newer AMPERE+ GPUs (e.g. 3090)☆11Apr 30, 2023Updated 2 years ago
- AI-Driven Research Systems (ADRS)☆136Dec 17, 2025Updated 3 months ago
- MLflow deployment plugin For IBM-cloud-watson-ml☆15May 7, 2025Updated 11 months ago
- Big Sur+i3-10100+CVN B460i Gaming V20☆12Nov 13, 2020Updated 5 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code for the paper "Bounce: Reliable High-Dimensional Bayesian Optimization for Combinatorial and Mixed Spaces"☆15Apr 30, 2024Updated last year
- ☆10May 25, 2021Updated 4 years ago
- Explaining neural decisions contrastively to alternative decisions.☆24Mar 18, 2021Updated 5 years ago
- [ICLR 2026] The official code for "Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models"☆26Feb 7, 2026Updated 2 months ago
- ☆12Nov 21, 2023Updated 2 years ago
- [COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoni…☆15Oct 31, 2025Updated 5 months ago
- ACPBench: Reasoning about Action, Change, and Planning. A benchmark designed to evaluate the fundamental reasoning abilities in the dom…☆33Feb 11, 2026Updated 2 months ago
- Efficient and readable change point detection package implemented in Python. (Singular Spectrum Transformation - SST, IKA-SST, ulSIF, RuL…☆35Mar 14, 2026Updated 3 weeks ago
- Code repository for SRE agent as part of ITBench☆19Sep 9, 2025Updated 7 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆26Feb 21, 2025Updated last year
- ☆19Jul 17, 2019Updated 6 years ago
- Code for "A survey and benchmark of high-dimensional Bayesian optimization of discrete sequences"☆16Feb 24, 2025Updated last year
- Repository of <FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models>☆77Jan 8, 2026Updated 3 months ago
- Initial commit☆13Aug 14, 2023Updated 2 years ago
- The repo of "BugLens"☆39Nov 12, 2025Updated 4 months ago
- ☆13Dec 12, 2025Updated 4 months ago