Wave: Python Domain-Specific Language for High Performance Machine Learning
☆45Updated this week
Alternatives and similar repositories for wave
Users that are interested in wave are comparing it to the libraries listed below
Sorting:
- ☆18Jun 6, 2025Updated 8 months ago
- Expert Specialization MoE Solution based on CUTLASS☆27Jan 19, 2026Updated last month
- Website for CSE 234, Winter 2025☆13Mar 24, 2025Updated 11 months ago
- TensaLang is a Tensor-first programming language, compiler, and runtime that let you write the Model’s inference engine (e.g. LLMs) and s…☆67Feb 20, 2026Updated last week
- ☆43Jan 24, 2026Updated last month
- A Triton-only attention backend for vLLM☆24Feb 11, 2026Updated 2 weeks ago
- A lightweight triton-based General Matrix Multiplication (GEMM) library.☆47Updated this week
- Super fast FP32 matrix multiplication on RDNA3☆84Mar 30, 2025Updated 10 months ago
- ☆32Jul 2, 2025Updated 7 months ago
- ☆38Aug 7, 2025Updated 6 months ago
- Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X☆75Feb 11, 2026Updated 2 weeks ago
- ☆66Updated this week
- NumPy+Jax with named axes and an uncompromising attitude☆23Mar 4, 2025Updated 11 months ago
- Sample Codes using NVSHMEM on Multi-GPU☆30Jan 22, 2023Updated 3 years ago
- Asynchronous pipeline parallel optimization☆19Feb 2, 2026Updated 3 weeks ago
- ☆27Dec 3, 2025Updated 2 months ago
- 详细双语注释版word2vec源码,well-annotated word2vec☆10Oct 3, 2021Updated 4 years ago
- The ASPLOS 2025 / EuroSys 2025 Contest Track☆40Aug 7, 2025Updated 6 months ago
- A minimal (really) out-of-tree MLIR example☆46Aug 14, 2025Updated 6 months ago
- DeepSeek-V3/R1 inference performance simulator☆177Mar 27, 2025Updated 11 months ago
- WaferLLM: Large Language Model Inference at Wafer Scale☆90Jan 7, 2026Updated last month
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Jun 28, 2025Updated 7 months ago
- A high-performance library for gradient based quantum optimal control☆12Jun 23, 2023Updated 2 years ago
- Time-Causal VAE☆19Nov 8, 2024Updated last year
- Official implementation of REArtGS (NeurIPS 2025)☆19Oct 24, 2025Updated 4 months ago
- ☆47Mar 14, 2025Updated 11 months ago
- Julia API for MLX☆14Dec 3, 2025Updated 2 months ago
- Automated bottleneck detection and solution orchestration☆19Feb 13, 2026Updated 2 weeks ago
- Subscribe to a public Notion page for changes and run a command on every detected change☆13Apr 6, 2021Updated 4 years ago
- ☆13Jan 7, 2025Updated last year
- Writeup that goes along with this:☆41Jan 18, 2018Updated 8 years ago
- 🎓Automatically Update LLM inference systems Papers Daily using Github Actions (Update Every 12th hours)☆12Updated this week
- tak is a benchmark thing 👍 (or maybe just an excuse to collect all the languages)☆14Oct 24, 2025Updated 4 months ago
- Code for "What really matters in matrix-whitening optimizers?"☆21Oct 31, 2025Updated 3 months ago
- A retrying client endpoint for WebSocket++☆11Aug 23, 2014Updated 11 years ago
- Speeding Up Your Python Codes 1000x☆12Apr 2, 2025Updated 10 months ago
- ☆10Nov 16, 2024Updated last year
- ☆29Feb 14, 2026Updated last week
- ☆13Feb 11, 2026Updated 2 weeks ago