mKernel: fast multi-node, multi-GPU fused kernels
☆237Jun 8, 2026Updated last week
Alternatives and similar repositories for mKernel
Users that are interested in mKernel are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A PyTorch native platform for training generative AI models☆17Apr 21, 2026Updated last month
- [NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆102Apr 20, 2026Updated last month
- MS108 Course Project, SJTU ACM Class.☆34Dec 20, 2022Updated 3 years ago
- Canvas: End-to-End Kernel Architecture Search in Neural Networks☆27Nov 18, 2024Updated last year
- Profiling and Improving the PyTorch Dataloader for high-latency Storage☆21Apr 18, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 面向多平台编译优化的深度学习中间表示☆10Oct 28, 2024Updated last year
- ☆19Jun 13, 2025Updated last year
- My study note for mlsys☆14Nov 4, 2024Updated last year
- Accelerating MoE with IO and Tile-aware Optimizations☆714Updated this week
- multi-master-paxos with 3 nodes☆14Apr 11, 2022Updated 4 years ago
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆27Feb 21, 2025Updated last year
- ☆12Jan 12, 2018Updated 8 years ago
- Share your GPU without MIG or MPS☆51Jan 27, 2026Updated 4 months ago
- Python Script to Open SJTU Dormitory Smart Lock☆10Sep 12, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆54Mar 15, 2025Updated last year
- A fast text search engine built for SSDs, written in C++.☆11Aug 29, 2022Updated 3 years ago
- Tutorials of Extending and importing TVM with CMAKE Include dependency.☆16Oct 11, 2024Updated last year
- ☆17Nov 10, 2021Updated 4 years ago
- ☆12Sep 4, 2021Updated 4 years ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Jun 21, 2019Updated 6 years ago
- SGLang kernel library for NPU☆143Jun 11, 2026Updated last week
- ☆19Aug 24, 2022Updated 3 years ago
- ☆13Mar 6, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It…☆10Nov 30, 2021Updated 4 years ago
- Triton adapter for Ascend. Mirror of https://gitcode.com/ascend/triton-ascend☆125May 18, 2026Updated last month
- an implementation of parallel skills like amp, ddp, pp, tp for learning purposes☆14Nov 18, 2023Updated 2 years ago
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆32Dec 21, 2024Updated last year
- CloudMapper helps you analyze your Amazon Web Services (AWS) environments.☆12Nov 8, 2021Updated 4 years ago
- ☆14Apr 28, 2026Updated last month
- Byte-Addressable File System☆19Mar 29, 2021Updated 5 years ago
- [EuroSys'25] Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization☆23Apr 13, 2026Updated 2 months ago
- ☆21Jun 9, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆68Nov 29, 2025Updated 6 months ago
- ☆14Mar 4, 2015Updated 11 years ago
- A Long Short Term Memory neural network for time series prediction. Memory blocks contain one memory cell in each. Weights for the networ…☆15Sep 3, 2018Updated 7 years ago
- Cataloging released Triton kernels.☆308Sep 9, 2025Updated 9 months ago
- Luthier, a GPU binary instrumentation tool for AMD GPUs☆27Jun 11, 2026Updated last week
- FaceCat捂脸猫框架是一款跨平台跨语言的图形和通信服务框架,主要由矿洞程序员陶德制作。该框架有C++,C#,Java三个语言版本,支持在Windows,iOS,Android,MacOS,Linux上运行。该框架已经开源,协议为BSD,当前开放版本仅为底层框架,尚未包含…☆14May 26, 2019Updated 7 years ago
- A list of papers I have read.☆12Nov 19, 2023Updated 2 years ago