a-hamdi/native-sparse-attention

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/a-hamdi/native-sparse-attention)

a-hamdi / native-sparse-attention

☆15

Alternatives and similar repositories for native-sparse-attention

Users that are interested in native-sparse-attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kelechi-c / dit_flow
View on GitHub
DiT (training + flow matching) in Jax
☆12Jan 5, 2025Updated last year
vllm-project / tml-fa4
View on GitHub
FA4-based Relative Attention Kernel developed by TML and Colfax
☆17Jul 17, 2026Updated last week
a-hamdi / read
View on GitHub
Anything I read, whether it's a paper, a book, or an article, I'll post here.
☆11Feb 13, 2025Updated last year
ROCm / rocprofiler-sdk
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆30May 28, 2026Updated last month
Yifei-Zuo / FlashLLA
View on GitHub
Official repository Flash Local Linear Attention
☆38May 28, 2026Updated last month
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
HydraQYH / expert_specialization_moe
View on GitHub
Expert Specialization MoE Solution based on CUTLASS
☆27Apr 14, 2026Updated 3 months ago
theunnecessarythings / llm-ptx
View on GitHub
GPT2 in handwritten PTX
☆15Jun 29, 2025Updated last year
a-hamdi / IslamicTranslator
View on GitHub
𝐈𝐬𝐥𝐚𝐦𝐢𝐜𝐓𝐫𝐚𝐧𝐬𝐥𝐚𝐭𝐨𝐫 is an automated solution designed to translate 𝐇𝐚𝐝𝐢𝐭𝐡𝐬 into multiple languages using the power …
☆11Jan 17, 2025Updated last year
Michalos88 / Randomized_SVD_in_CUDA
View on GitHub
FAST Randomized SVD on a GPU with CUDA 🏎️
☆16May 21, 2019Updated 7 years ago
yangjunjie0320 / OptimizeDGEMM
View on GitHub
☆11Feb 13, 2025Updated last year
togethercomputer / ParallelKernelBench
View on GitHub
☆44Jul 1, 2026Updated 3 weeks ago
Ratbuyer / h100-features
View on GitHub
☆18Mar 12, 2025Updated last year
NVIDIA / nccl-extensions
View on GitHub
Communication patterns for AI, built on top of NCCL device and host APIs
☆18Updated this week
tile-ai / TileFoundry
View on GitHub
☆54Updated this week
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
kmohan321 / Research_Papers
View on GitHub
☆45Mar 31, 2025Updated last year
osayamenja / FlashMoE
View on GitHub
Distributed MoE in a Single Kernel [NeurIPS '25]
☆273May 5, 2026Updated 2 months ago
aikitoria / nanotrace
View on GitHub
Low overhead tracing library and trace visualizer for pipelined CUDA kernels
☆137Jul 17, 2026Updated last week
gpuasm / autosass
View on GitHub
☆17Mar 29, 2026Updated 3 months ago
1y33 / 100Days
View on GitHub
GPU Kernels
☆225Apr 27, 2025Updated last year
aryagxr / cuda
View on GitHub
coding CUDA everyday!
☆77Feb 5, 2026Updated 5 months ago
mdy666 / Scalable-Flash-Native-Sparse-Attention
View on GitHub
☆48Dec 13, 2025Updated 7 months ago
Lhongpei / ipps-drl
View on GitHub
A PyTorch-Based GPU Parallel Env for IPPS Problem, supporting DRL, IL and Learning Guided MCTS.
☆17Jun 17, 2026Updated last month
OneSignal / onesignal-python-api
View on GitHub
☆22Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
liyuan24 / deepseek_from_scratch
View on GitHub
☆18Apr 9, 2025Updated last year
memochou1993 / memochou1993.github.io
View on GitHub
Memo's Blog
☆27Jan 21, 2026Updated 6 months ago
hkproj / triton-flash-attention
View on GitHub
☆258Jan 2, 2025Updated last year
leimao / CUTLASS-Examples
View on GitHub
CUTLASS and CuTe Examples
☆136Nov 30, 2025Updated 7 months ago
rkinas / triton-resources
View on GitHub
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆495Mar 10, 2025Updated last year
breez3young / DIMA
View on GitHub
[NIPS'25] Official Implementation of "Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective" in PyTorch.
☆17Nov 11, 2025Updated 8 months ago
Bruce-Lee-LY / cuda_auto_tune
View on GitHub
NCU-driven iterative optimization workflow for CUDA/CUTLASS/Triton/CuTe DSL kernels.
☆23Apr 10, 2026Updated 3 months ago
sotopia-lab / sotopia-rl
View on GitHub
Sotopia-RL: Reward Design for Social Intelligence
☆52Apr 1, 2026Updated 3 months ago
wangjs96 / Graph-in-Graph-Neural-Network
View on GitHub
Graph in Graph Neural Network (https://arxiv.org/abs/2407.00696)
☆16Sep 12, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
a-hamdi / GPU
View on GitHub
100 days of building GPU kernels!
☆616Apr 27, 2025Updated last year
CentML / lorafusion
View on GitHub
LoRAFusion: Efficient LoRA Fine-Tuning for LLMs
☆28Jul 2, 2026Updated 3 weeks ago
ighoshsubho / awesome-kernel-skills
View on GitHub
Public repo for kernel writing skills in CuTeDSL, Triton, Tilelang and CUDA
☆24Jul 7, 2026Updated 2 weeks ago
esl-epfl / szcore
View on GitHub
This repository hosts an open seizure detection benchmarking platform.
☆21Jul 5, 2026Updated 2 weeks ago
IST-DASLab / MatGPTQ
View on GitHub
Code for MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization
☆22Feb 18, 2026Updated 5 months ago
cyhdmjzzy / DeepEP-Code-Analysis
View on GitHub
☆26Feb 27, 2026Updated 4 months ago
Ibrahimshamma / The-Hundred-Page-Machine-Learning-Book
View on GitHub
This repository contains the draft PDF copies of the book: The 100 Page Machine Learning Book
☆13Mar 17, 2019Updated 7 years ago