DeepLink-org/DLBlas

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DeepLink-org/DLBlas)

DeepLink-org / DLBlas

DLBlas: clean and efficient kernels

☆43

Alternatives and similar repositories for DLBlas

Users that are interested in DLBlas are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

DeepLink-org / DLCompiler
View on GitHub
triton for dsa
☆68Jul 10, 2026Updated 2 weeks ago
BBuf / tensorrt-llm-moe
View on GitHub
☆34Feb 3, 2025Updated last year
DeepLink-org / DLSlime
View on GitHub
Composable and Embeddable Communication Runtime for Distributed AI Services
☆102Jun 5, 2026Updated last month
serdes21 / flashtile
View on GitHub
FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.
☆61Feb 6, 2026Updated 5 months ago
DeepLink-org / dlinfer
View on GitHub
☆74Updated this week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
triple-mu / HunyuanDiT-TensorRT-libtorch
View on GitHub
HunyuanDiT with TensorRT and libtorch
☆18May 22, 2024Updated 2 years ago
OpenBMB / infllmv2_cuda_impl
View on GitHub
☆102Feb 11, 2026Updated 5 months ago
pixeli99 / MixLN
View on GitHub
[ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…
☆30Jul 24, 2025Updated last year
sgl-project / sgl-flash-attn
View on GitHub
Fast and memory-efficient exact attention
☆22Jun 26, 2026Updated 3 weeks ago
luliyucoordinate / flash-attention-minimal
View on GitHub
Flash Attention in ~100 lines of CUDA (forward pass only)
☆12Jun 10, 2024Updated 2 years ago
archibate / archibate
View on GitHub
☆22Mar 16, 2026Updated 4 months ago
bxttttt / getting-started-guide-and-introduction-to-MXMACA
View on GitHub
MXMACA入门materials
☆22Jun 9, 2024Updated 2 years ago
dsl-learn / triton-tutorial
View on GitHub
Getting Started with Triton: A Tutorial for Python Beginners
☆61Mar 26, 2026Updated 3 months ago
toyaix / tritonllm
View on GitHub
LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model
☆119Apr 28, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
caijixueIT / CUDA_Learning_for_Freshman
View on GitHub
☆14Nov 3, 2025Updated 8 months ago
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
kvcache-ai / kvcache-blog
View on GitHub
☆19Updated this week
AXERA-TECH / OWLVIT-ONNX-AX650-CPP
View on GitHub
☆23Jan 3, 2024Updated 2 years ago
RightNow-AI / StreamIndex
View on GitHub
Memory-bounded compressed sparse attention via streaming top-k. Triton kernels for the DeepSeek-V4 lightning indexer. 32x regime extensio…
☆22May 5, 2026Updated 2 months ago
triple-mu / Qwen-Image-TensorRT
View on GitHub
Qwen-Image's DiT inference with TensorRT-10
☆21Oct 13, 2025Updated 9 months ago
SJTU-DENG-Lab / Diffulex
View on GitHub
Flexible and Pluggable Serving Engine for Diffusion LLMs
☆147Jul 13, 2026Updated last week
ai-compiler-study / triton-kernels
View on GitHub
Triton kernels for Flux
☆23Jul 7, 2025Updated last year
fishmingyu / GeoT
View on GitHub
GeoT: Tensor Centric Library for Graph Neural Network via Efficient Segment Reduction on GPU
☆24Mar 27, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
RangiLyu / llama.mmengine
View on GitHub
Training LLaMA language model with MMEngine! It supports LoRA fine-tuning!
☆40Apr 2, 2023Updated 3 years ago
InternLM / Kernel-Smith
View on GitHub
☆27Mar 31, 2026Updated 3 months ago
NgCafai / deep-learning-system
View on GitHub
Homework of CMU 10-414/714: Deep Learning Systems (https://dlsyscourse.org/)
☆15Mar 21, 2024Updated 2 years ago
HydraQYH / hp_rms_norm
View on GitHub
High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)
☆30Jan 22, 2026Updated 6 months ago
DeepLink-org / deeplink.framework
View on GitHub
☆76Oct 31, 2024Updated last year
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,066Updated this week
ant-research / M2-Miner
View on GitHub
[ICLR 2026] M2-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining
☆55Apr 22, 2026Updated 3 months ago
facebookresearch / sparse-delta-memory
View on GitHub
This repositories contains the reference implementation for the Sparse Delta Memory paper.More precisely, it contains the model definitio…
☆31Jul 9, 2026Updated 2 weeks ago
enyac-group / Elana
View on GitHub
Elana: A Simple Energy & Latency Analyzer for LLMs
☆16Apr 3, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
inclusionAI / GroveMoE
View on GitHub
☆24Aug 20, 2025Updated 11 months ago
perplexityai / pplx-kernels
View on GitHub
Perplexity GPU Kernels
☆591Nov 7, 2025Updated 8 months ago
luongthecong123 / fp8-quant-matmul
View on GitHub
Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.
☆19Feb 9, 2026Updated 5 months ago
KuangjuX / cuda-evolve-oss
View on GitHub
Autonomous GPU kernel optimization system driven by AI agents.
☆31Mar 29, 2026Updated 3 months ago
fzyzcjy / torch_memory_saver
View on GitHub
Allow torch tensor memory to be released and resumed later
☆260Updated this week
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,498Updated this week
dgSPARSE / dgNN
View on GitHub
[Mlsys'22] Understanding gnn computational graph: A coordinated computation, io, and memory perspective
☆22Sep 11, 2023Updated 2 years ago