hazan-lab / flash-stuView external linksLinks
PyTorch implementation of the Flash Spectral Transform Unit.
☆21Sep 19, 2024Updated last year
Alternatives and similar repositories for flash-stu
Users that are interested in flash-stu are comparing it to the libraries listed below
Sorting:
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- ☆13Jan 7, 2025Updated last year
- ☆12Mar 7, 2022Updated 3 years ago
- ☆42Jan 24, 2026Updated 3 weeks ago
- ☆35Apr 12, 2024Updated last year
- ☆105Feb 25, 2025Updated 11 months ago
- Code and data for paper "(How) do Language Models Track State?"☆21Mar 31, 2025Updated 10 months ago
- 方便扩展的Cuda算子理解和优化框架,仅用在学习使用☆18Jun 13, 2024Updated last year
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19May 8, 2025Updated 9 months ago
- ☆22May 5, 2025Updated 9 months ago
- ☆39Dec 14, 2025Updated 2 months ago
- ☆20Dec 24, 2024Updated last year
- study of cutlass☆22Nov 10, 2024Updated last year
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆28Apr 17, 2024Updated last year
- Sample Codes using NVSHMEM on Multi-GPU☆30Jan 22, 2023Updated 3 years ago
- A GPU FP32 computation method with Tensor Cores.☆26Dec 8, 2025Updated 2 months ago
- Experiments on Multi-Head Latent Attention☆99Aug 19, 2024Updated last year
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Jun 6, 2024Updated last year
- Experiments on the impact of depth in transformers and SSMs.☆40Oct 23, 2025Updated 3 months ago
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 6 months ago
- ☆29Oct 3, 2022Updated 3 years ago
- ☆33Oct 4, 2024Updated last year
- Official code for UnICORNN (ICML 2021)☆28Oct 1, 2021Updated 4 years ago
- Official Code Repository for the paper "Key-value memory in the brain"☆31Feb 25, 2025Updated 11 months ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- DeeperGEMM: crazy optimized version☆73May 5, 2025Updated 9 months ago
- Code for Draft Attention☆99May 22, 2025Updated 8 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆174Oct 20, 2025Updated 3 months ago
- Accelerated First Order Parallel Associative Scan☆196Jan 7, 2026Updated last month
- Awesome Triton Resources☆39Apr 27, 2025Updated 9 months ago
- Optimize GEMM with tensorcore step by step☆36Dec 17, 2023Updated 2 years ago
- ☆26Dec 3, 2025Updated 2 months ago
- The evaluation framework for training-free sparse attention in LLMs☆119Jan 27, 2026Updated 2 weeks ago
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆129Jun 24, 2025Updated 7 months ago
- 详细双语注释版word2vec源码,well-annotated word2vec☆10Oct 3, 2021Updated 4 years ago
- ☆155Mar 4, 2025Updated 11 months ago
- PTX-EMU is a simple emulator for CUDA program.☆37Apr 25, 2025Updated 9 months ago
- ☆77Updated this week
- Implement Flash Attention using Cute.☆100Dec 17, 2024Updated last year