TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.
โ19May 12, 2024Updated last year
Alternatives and similar repositories for TiledKernel
Users that are interested in TiledKernel are comparing it to the libraries listed below
Sorting:
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.โ14Nov 23, 2024Updated last year
- ๐My Collections of CUDA Kernels~โ11Jun 25, 2024Updated last year
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel โฆโ193Jan 28, 2025Updated last year
- ๅฎ้ช๏ผrust ๅฎ็ฐ llama2 ๆจ็โ17Feb 23, 2024Updated 2 years ago
- My tests and experiments with some popular dl frameworks.โ17Sep 11, 2025Updated 5 months ago
- CAKE Library for constant-bandwidth matrix multiplication on CPUsโ14Apr 6, 2024Updated last year
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.โ51Jul 23, 2024Updated last year
- โ20Sep 28, 2024Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.โ96Sep 19, 2025Updated 5 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.โ106Jun 28, 2025Updated 7 months ago
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Correctionsโ125Jun 23, 2022Updated 3 years ago
- Quantized Attention on GPUโ44Nov 22, 2024Updated last year
- โ11Apr 3, 2023Updated 2 years ago
- Parsers for CUDA binary filesโ24Dec 29, 2023Updated 2 years ago
- โ42Nov 1, 2025Updated 3 months ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin toโฆโ30Jan 28, 2026Updated last month
- โ24May 9, 2025Updated 9 months ago
- gups mirrorโ11Oct 25, 2015Updated 10 years ago
- FlexAttention w/ FlashAttention3 Supportโ27Oct 5, 2024Updated last year
- [ASP-DAC 2025] "NeuronQuant: Accurate and Efficient Post-Training Quantization for Spiking Neural Networks" Official Implementationโ15Mar 6, 2025Updated 11 months ago
- GEMM and Winograd based convolutions using CUTLASSโ28Jul 15, 2020Updated 5 years ago
- General Purpose Graphics Processing Unit (GPGPU) IP Coreโ11Jul 4, 2014Updated 11 years ago
- TileGraph is an experimental DNN compiler that utilizes static code generation and kernel fusion techniques.โ12Sep 18, 2024Updated last year
- ้ๅ่ฎพๅคๆ ไบ่ฟๅถๅฏน่ฑกโ14Nov 22, 2025Updated 3 months ago
- โ32Dec 1, 2022Updated 3 years ago
- โ18Apr 8, 2022Updated 3 years ago
- Benchmark tests supporting the TiledCUDA library.โ18Nov 19, 2024Updated last year
- โ105Nov 7, 2024Updated last year
- Transformers components but in Tritonโ34May 9, 2025Updated 9 months ago
- โ87Updated this week
- โ126Jan 22, 2026Updated last month
- โ15Dec 16, 2021Updated 4 years ago
- Noisy language compilerโ17Jul 31, 2024Updated last year
- PTX-EMU is a simple emulator for CUDA program.โ37Apr 25, 2025Updated 10 months ago
- โ18Oct 15, 2020Updated 5 years ago
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelangโ43Nov 19, 2025Updated 3 months ago
- Parallel Associative Scan for Language Modelsโ18Jan 8, 2024Updated 2 years ago
- Hypervisor written in Rust for the RISC-V 1.0 hypervisor extensionโ16Oct 21, 2024Updated last year
- modified cutlassโ15Oct 26, 2020Updated 5 years ago