performance engineering
☆30Jul 11, 2024Updated last year
Alternatives and similar repositories for PE
Users that are interested in PE are comparing it to the libraries listed below
Sorting:
- 国科大编译作业三:Point to 分析☆19Dec 12, 2021Updated 4 years ago
- ☆44Updated this week
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆25May 12, 2025Updated 9 months ago
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.☆54Feb 6, 2026Updated 3 weeks ago
- High Performance Grouped GEMM in PyTorch☆31May 10, 2022Updated 3 years ago
- Sheriff consists of two tools: Sheriff-Detect, a false-sharing detector, and Sheriff-Protect, a false-sharing eliminator that you can lin…☆32Jul 6, 2018Updated 7 years ago
- ☆33Jul 17, 2024Updated last year
- ☆28Dec 3, 2025Updated 3 months ago
- The ASPLOS 2025 / EuroSys 2025 Contest Track☆40Aug 7, 2025Updated 6 months ago
- 详细双语注释版word2vec源码,well-annotated word2vec☆10Oct 3, 2021Updated 4 years ago
- An easy-to-use automatic performance diagnosis and optimization tool for HPC applications☆35Jan 25, 2018Updated 8 years ago
- Official HPCG benchmark source code☆339Jul 5, 2024Updated last year
- Compute Benchmarks for oneAPI Level Zero and OpenCL™ Driver☆41Feb 25, 2026Updated last week
- 国科大研究生课程 操作系统高级教程2023年思考题☆12Dec 24, 2023Updated 2 years ago
- ☆10May 5, 2022Updated 3 years ago
- ☆20May 24, 2025Updated 9 months ago
- ☆40Feb 28, 2020Updated 6 years ago
- 国科大编译作业:基于Clang的C语言解释执行器☆43Dec 12, 2021Updated 4 years ago
- CPU and GPU tutorial examples☆13Apr 4, 2025Updated 11 months ago
- BigBang-Proton is a LLM pretrained on cross-scale, cross-structure, cross-discipline real-world scientific tasks to construct a scienti…☆22Nov 8, 2025Updated 3 months ago
- Unofficial mirror of svn://svn.code.sf.net/p/openfoam-extend/svn/trunk/Breeder_2.0/libraries/swak4Foam/☆11Jun 8, 2017Updated 8 years ago
- Microbenchmark that unveals the mechanisms behind power readings reported by nvidia-smi on your NVIDIA GPU.☆14Dec 12, 2024Updated last year
- Pytorch routines for (Ker)nel (Mac)hines☆10Oct 10, 2025Updated 4 months ago
- a simple API to use CUPTI☆11Aug 19, 2025Updated 6 months ago
- ☆14Feb 11, 2026Updated 3 weeks ago
- DiscreteTom's Blog Boilerplate.☆10Mar 6, 2023Updated 2 years ago
- SQL Optimizations using MLIR☆12Apr 5, 2020Updated 5 years ago
- Speeding Up Your Python Codes 1000x☆12Apr 2, 2025Updated 11 months ago
- Code for "What really matters in matrix-whitening optimizers?"☆22Oct 31, 2025Updated 4 months ago
- Large language models to diffusion finetuning code☆24Jun 2, 2025Updated 9 months ago
- Automated bottleneck detection and solution orchestration☆19Feb 24, 2026Updated last week
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 6 months ago
- ☆18Jun 6, 2025Updated 8 months ago
- ☆23Jul 11, 2025Updated 7 months ago
- a vue-demo:vue仿网易新闻m站☆10Jul 26, 2017Updated 8 years ago
- ☆13Jan 7, 2025Updated last year
- automatically apply for Indeed jobs.☆13Nov 10, 2021Updated 4 years ago
- A selective knowledge distillation algorithm for efficient speculative decoders☆36Nov 27, 2025Updated 3 months ago
- Simple RAM benchmark for Linux.☆11Aug 4, 2021Updated 4 years ago