Fast and efficient attention method exploration and implementation.
☆25Mar 25, 2025Updated 11 months ago
Alternatives and similar repositories for FlashMLA
Users that are interested in FlashMLA are comparing it to the libraries listed below
Sorting:
- An NVIDIA AI Workbench Example Project for Finetuning Llama 2☆35Aug 29, 2024Updated last year
- Emulating DMA Engines on GPUs for Performance and Portability☆41May 17, 2015Updated 10 years ago
- DLBlas: clean and efficient kernels☆35Updated this week
- A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.☆73Feb 18, 2026Updated last month
- ☆116May 16, 2025Updated 10 months ago
- SC 2021, "LogECMem: Coupling Erasure-Coded In-Memory Key-Value Stores with Parity Logging"☆12Jul 12, 2021Updated 4 years ago
- MXMACA入门materials☆21Jun 9, 2024Updated last year
- ☆60Nov 21, 2024Updated last year
- Export Blender (2.4x) curves to TikZ format for use with TeX☆13Apr 18, 2014Updated 11 years ago
- create concept map from textbook data☆11May 4, 2018Updated 7 years ago
- ONCache: A Cache-Based Low-Overhead Container Overlay Network☆21Jun 7, 2025Updated 9 months ago
- ☆13Aug 1, 2025Updated 7 months ago
- Artifacts of EVT ASPLOS'24☆30Mar 6, 2024Updated 2 years ago
- EPOCH Input System Version 2☆10Jun 5, 2020Updated 5 years ago
- a simple WIP runtime reflection library☆13May 11, 2022Updated 3 years ago
- High-performance LLM operator library built on TileLang.☆93Updated this week
- ☆74Updated this week
- ☆17Dec 9, 2024Updated last year
- An easy and flexible mathematical programming environment for Python.☆12Jun 16, 2018Updated 7 years ago
- Load your graph of bookmarks and tags into a neo4j database and explore it☆14Jan 1, 2017Updated 9 years ago
- ☆16Oct 13, 2023Updated 2 years ago
- Global Address SPace toolbox -- Julia wrapper☆10Nov 17, 2017Updated 8 years ago
- Matrix Algebra on GPU and Multicore Architectures (MAGMA) source releases from http://icl.cs.utk.edu/magma/index.html☆25Jun 4, 2015Updated 10 years ago
- G'MIC-Qt is a versatile front-end to the image processing framework G'MIC.☆17Updated this week
- ☆20Nov 7, 2023Updated 2 years ago
- 🐝 Tiny CLI to post simultaneously to Mastodon and Bluesky☆17Sep 14, 2025Updated 6 months ago
- Prototype for a SPIR-V assembler and dissasembler. It provides a composable Java interface for generating SPIR-V code at runtime.☆13Oct 31, 2025Updated 4 months ago
- ☆19Nov 23, 2021Updated 4 years ago
- AMD’s C++ library for accelerating tensor primitives☆49Updated this week
- Simple, lightweight transformers in Fortran☆17Nov 17, 2023Updated 2 years ago
- ☆16Sep 27, 2018Updated 7 years ago
- Phi-2 Colab Notebook☆14Dec 14, 2023Updated 2 years ago
- Reconstruction of distorted underwater images using robust registration☆15Apr 16, 2019Updated 6 years ago
- Code for paper "Beyond Closure Models: Learning Chaotic Systems via Physics-Informed Neural Operators".☆16Dec 24, 2025Updated 2 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆12Mar 2, 2026Updated 2 weeks ago
- SODECL is a library of ordinary differential equation (ODE) and stochastic differential equation (SDE) solvers in OpenCL.☆11Jul 4, 2020Updated 5 years ago
- PARADIS, a lightweight and flexible weather forecast model that tries to Keep It Simple.☆27Mar 4, 2026Updated 2 weeks ago
- Rename files in the same way you edit text☆16Oct 1, 2025Updated 5 months ago
- Argonne Leadership Computing Facility OpenCL tutorial☆10Aug 22, 2025Updated 7 months ago