performance engineering
☆31Jul 11, 2024Updated last year
Alternatives and similar repositories for PE
Users that are interested in PE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official HPCG benchmark source code☆342Jul 5, 2024Updated last year
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆25May 12, 2025Updated last year
- Microbenchmark that unveals the mechanisms behind power readings reported by nvidia-smi on your NVIDIA GPU.☆14Dec 12, 2024Updated last year
- Open source of an IBM Optimized version of the HPCG benchmark.☆17Sep 17, 2025Updated 8 months ago
- Clustering algorithms processing methods on astronomical spectra.☆10Oct 24, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆32Jul 17, 2024Updated last year
- High Performance Grouped GEMM in PyTorch☆30May 10, 2022Updated 4 years ago
- Sheriff consists of two tools: Sheriff-Detect, a false-sharing detector, and Sheriff-Protect, a false-sharing eliminator that you can lin…☆32Jul 6, 2018Updated 7 years ago
- ucas hpc course code☆15May 24, 2023Updated 3 years ago
- ☆23Mar 2, 2025Updated last year
- FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels☆167Apr 26, 2026Updated last month
- Dynamic CPU/GPU power scaling to maximize renewable energy usage and reduce emissions☆17Apr 11, 2025Updated last year
- 用于国科大自动评教。☆14Apr 23, 2024Updated 2 years ago
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.☆60Feb 6, 2026Updated 3 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Stellar LAbel Machine (SLAM).☆13Aug 11, 2024Updated last year
- 智能计算系统陈云霁、数值计算王兵团笔记☆11Oct 14, 2022Updated 3 years ago
- Yet another (unofficial) Ollama GUI☆20Mar 21, 2026Updated 2 months ago
- The ASPLOS 2025 / EuroSys 2025 Contest Track☆40Apr 4, 2026Updated last month
- ☆40Feb 28, 2020Updated 6 years ago
- This repository contains my solutions to the course Data-driven Astronomy offered by The University of Sydney on Coursera☆17Jan 28, 2021Updated 5 years ago
- UCAS 081100M01002H 图像处理与分析: Python实现实例与实验☆14Jul 17, 2020Updated 5 years ago
- 国科大编译作业:基于Clang的C语言解释执行器☆43Dec 12, 2021Updated 4 years ago
- Fast and memory-efficient exact attention☆122Updated this week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆330Jun 10, 2025Updated 11 months ago
- a simple API to use CUPTI☆10Aug 19, 2025Updated 9 months ago
- ☆20May 24, 2025Updated last year
- DiscreteTom's Blog Boilerplate.☆10Mar 6, 2023Updated 3 years ago
- Solution of Programming Massively Parallel Processors☆51Jan 15, 2024Updated 2 years ago
- ☆289May 19, 2026Updated last week
- ☆15Mar 18, 2026Updated 2 months ago
- 采用python编写的国科大(雁栖湖)深澜校园网登录脚本,以实现命令行登录或者断线重连等,仅提供登录功能☆13Jun 16, 2021Updated 4 years ago
- 国科大编译作业二:LLVM Pass处理函数调用☆18Nov 11, 2021Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- 智能计算系统 AI Computing Systems 陈云霁☆199Dec 11, 2022Updated 3 years ago
- ☆16Nov 22, 2022Updated 3 years ago
- To better understand the ggml library☆27Jun 13, 2025Updated 11 months ago
- Scikit-learn tutorial at SciPy2016☆15Aug 14, 2022Updated 3 years ago
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated 4 months ago
- 计算机体系结构 2020秋季 UCAS 《计算机体系结构基础》第 2 版课后习题☆68Aug 11, 2021Updated 4 years ago
- Benchmark SGLang on SLURM☆24Apr 20, 2026Updated last month