☆153Jul 4, 2025Updated 8 months ago
Alternatives and similar repositories for mdy_triton
Users that are interested in mdy_triton are comparing it to the libraries listed below
Sorting:
- ☆104Sep 9, 2024Updated last year
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆79Aug 12, 2024Updated last year
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆30Updated this week
- Source-to-Source Debuggable Derivatives in Pure Python☆15Jan 23, 2024Updated 2 years ago
- Implement Flash Attention using Cute.☆102Dec 17, 2024Updated last year
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 10 months ago
- Distributed Compiler based on Triton for Parallel Systems☆1,380Feb 13, 2026Updated 3 weeks ago
- Transformers components but in Triton☆34May 9, 2025Updated 10 months ago
- Stable Diffusion+LCM在SG2300X上,纵享丝滑一秒出图☆17Nov 29, 2024Updated last year
- Counting-Stars (★)☆83Nov 24, 2025Updated 3 months ago
- ☆301Updated this week
- LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.☆867Dec 10, 2025Updated 2 months ago
- A bunch of kernels that might make stuff slower 😉☆75Mar 2, 2026Updated last week
- Using FlexAttention to compute attention with different masking patterns☆47Sep 22, 2024Updated last year
- Efficient triton implementation of Native Sparse Attention.☆269May 23, 2025Updated 9 months ago
- ☆97Mar 26, 2025Updated 11 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆165Feb 11, 2026Updated 3 weeks ago
- ☆124May 28, 2024Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆107Jun 28, 2025Updated 8 months ago
- Cataloging released Triton kernels.☆296Sep 9, 2025Updated 6 months ago
- GPGPU-Sim 中文注释版代码,包含 GPGPU-Sim 模拟器的最新版代码,经过中文注释,以帮助中文用户更好地理解和使用该模拟器。☆28Dec 18, 2024Updated last year
- Text2speech & tone color conversion demo running on SG2300x 结合openvoice和emotivoice的TTS+即时克隆☆22Oct 30, 2024Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆239Jun 15, 2025Updated 8 months ago
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆974Feb 5, 2026Updated last month
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆161Oct 13, 2025Updated 4 months ago
- ☆227Nov 19, 2025Updated 3 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Jun 6, 2024Updated last year
- WebResearcher: An Iterative Deep-Research Agent,迭代式深度研究智能体☆40Feb 13, 2026Updated 3 weeks ago
- 给llvm17.0.6添加一个新后端Cpu0☆12Apr 22, 2024Updated last year
- Official Implementation of ACL2023: Don't Parse, Choose Spans! Continuous and Discontinuous Constituency Parsing via Autoregressive Span …☆14Aug 25, 2023Updated 2 years ago
- Advanced Formal Language Theory (263-5352-00L; Frühjahr 2023)☆10Feb 21, 2023Updated 3 years ago
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.☆29Feb 22, 2026Updated 2 weeks ago
- PyTorch implementation for PaLM: A Hybrid Parser and Language Model.☆10Jan 7, 2020Updated 6 years ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆329Updated this week
- Hands-On Practical MLIR Tutorial☆52Aug 21, 2025Updated 6 months ago
- Puzzles for learning Triton, play it with minimal environment configuration!☆637Dec 28, 2025Updated 2 months ago
- ☆50Jun 16, 2025Updated 8 months ago
- ☆42Nov 1, 2025Updated 4 months ago
- Source code of the paper "Prediction of Molecular Absorption Wavelength Using Deep Neural Networks"☆10May 29, 2022Updated 3 years ago