☆66Updated this week
Alternatives and similar repositories for FlyDSL
Users that are interested in FlyDSL are comparing it to the libraries listed below
Sorting:
- A Triton-only attention backend for vLLM☆24Feb 11, 2026Updated 2 weeks ago
- ☆39Dec 14, 2025Updated 2 months ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 6 months ago
- ☆43Jan 24, 2026Updated last month
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Nov 23, 2024Updated last year
- Wave: Python Domain-Specific Language for High Performance Machine Learning☆45Updated this week
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆148May 10, 2025Updated 9 months ago
- Automatic differentiation for Triton Kernels☆29Aug 12, 2025Updated 6 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆96Sep 19, 2025Updated 5 months ago
- My Paper Reading Lists and Notes.☆21Feb 17, 2026Updated last week
- A GPU FP32 computation method with Tensor Cores.☆26Dec 8, 2025Updated 2 months ago
- Sample Codes using NVSHMEM on Multi-GPU☆30Jan 22, 2023Updated 3 years ago
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.☆51Feb 6, 2026Updated 3 weeks ago
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆60Mar 25, 2025Updated 11 months ago
- Intel Compiler for SystemC☆27Jun 1, 2023Updated 2 years ago
- ☆31Apr 19, 2025Updated 10 months ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated 10 months ago
- ☆30Jan 26, 2023Updated 3 years ago
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 9 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Mar 24, 2025Updated 11 months ago
- Ship correct and fast LLM kernels to PyTorch☆142Jan 14, 2026Updated last month
- A collection of reproducible inference engine benchmarks☆38Apr 22, 2025Updated 10 months ago
- 详细双语注释版word2vec源码,well-annotated word2vec☆10Oct 3, 2021Updated 4 years ago
- An algorithm that intelligently executes a crypto order over time via Coinbase☆12Oct 26, 2021Updated 4 years ago
- ☆27Dec 3, 2025Updated 2 months ago
- ☆130Aug 18, 2025Updated 6 months ago
- Practical exercises for HOW Series "Deep Dive", a Web-based training on parallel programming and performance optimization☆33Feb 1, 2019Updated 7 years ago
- WaferLLM: Large Language Model Inference at Wafer Scale☆90Jan 7, 2026Updated last month
- Yet another tool to search through your (exported) ChatGPT conversations☆13Dec 24, 2025Updated 2 months ago
- CFD case for simulation of RD107 rocket engine☆11Sep 17, 2025Updated 5 months ago
- [No longer active] A fork of OpenSBI, with software-emulated hypervisor extension support☆42Aug 15, 2025Updated 6 months ago
- An efficient 40% keyboard layout☆11Dec 30, 2023Updated 2 years ago
- lab solutions of ICS course☆10Jan 20, 2013Updated 13 years ago
- ☆40Feb 28, 2020Updated 5 years ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆96Feb 20, 2026Updated last week
- grafana, prometheus, alertmanager, node-exporter, cadvisor, alertmanager-bot for telegram in docker-compose and awesome grafana dashbord☆11Apr 19, 2023Updated 2 years ago
- Distributed, Replicated, Protocol-generic Key-value Store in Async Rust For SMR Protocols Research☆17Feb 10, 2026Updated 2 weeks ago
- Baremetal softwares for TrivialMIPS platform☆11Aug 12, 2019Updated 6 years ago