open-neutrino / neutrinoView external linksLinks
☆237Dec 25, 2025Updated last month
Alternatives and similar repositories for neutrino
Users that are interested in neutrino are comparing it to the libraries listed below
Sorting:
- A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs☆158Jan 13, 2026Updated last month
- Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.☆84Jan 27, 2026Updated 3 weeks ago
- UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g…☆1,211Updated this week
- ☆52Oct 10, 2024Updated last year
- Fast OS-level support for GPU checkpoint and restore☆271Sep 28, 2025Updated 4 months ago
- Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes☆146Mar 29, 2025Updated 10 months ago
- Simulator code of the paper "Dissecting and Modeling the Architecture of Modern GPU Cores"☆64Oct 15, 2025Updated 4 months ago
- ☆24May 9, 2025Updated 9 months ago
- ☆81Jan 22, 2026Updated 3 weeks ago
- Distributed Compiler based on Triton for Parallel Systems☆1,358Updated this week
- ☆13Feb 6, 2026Updated last week
- An experimental parallel training platform☆56Mar 25, 2024Updated last year
- GPU Performance Advisor☆65Jul 25, 2022Updated 3 years ago
- NCCL Profiling Kit☆152Jul 1, 2024Updated last year
- Pin based tool for simulation of rack-scale disaggregated memory systems☆32Mar 8, 2025Updated 11 months ago
- Revealing the Unstable Foundations of eBPF-Based Kernel Extensions☆17May 20, 2025Updated 8 months ago
- Automated Testing and Adaptive Detection of **Slow Faults** in Distributed Systems☆18Mar 6, 2025Updated 11 months ago
- ☆12May 13, 2025Updated 9 months ago
- ☆65Apr 26, 2025Updated 9 months ago
- GeminiFS: A Companion File System for GPUs☆72Feb 18, 2025Updated last year
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Nov 23, 2024Updated last year
- NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading☆84Jun 16, 2025Updated 8 months ago
- Tutorials for NVIDIA CUPTI samples☆54Nov 3, 2025Updated 3 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Jun 28, 2025Updated 7 months ago
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 9 months ago
- ☆15Jul 9, 2024Updated last year
- ☆32Jul 2, 2025Updated 7 months ago
- Artifacts of EuroSys'24 paper "Exploring Performance and Cost Optimization with ASIC-Based CXL Memory"☆31Feb 21, 2024Updated last year
- Dynamic Memory Management for Serving LLMs without PagedAttention☆461May 30, 2025Updated 8 months ago
- collection of benchmarks to measure basic GPU capabilities☆494Oct 24, 2025Updated 3 months ago
- ☆114May 16, 2025Updated 9 months ago
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,130Updated this week
- GVProf: A Value Profiler for GPU-based Clusters☆52Mar 24, 2024Updated last year
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆88Dec 2, 2025Updated 2 months ago
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.☆94Feb 23, 2023Updated 2 years ago
- ☆16Apr 22, 2025Updated 9 months ago
- Open ABI and FFI for Machine Learning Systems☆337Updated this week
- The Artifact Evaluation Version of SOSP Paper #19☆52Aug 19, 2024Updated last year
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆41May 13, 2025Updated 9 months ago