mlc-ai / mlc-enLinks

☆440

Alternatives and similar repositories for mlc-en

Users that are interested in mlc-en are comparing it to the libraries listed below

Sorting:

mlc-ai / notebooks
☆210Updated 11 months ago
hidet-org / hidet
An open-source efficient deep learning framework/compiler, written in python.
☆735Updated 2 months ago
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆395Updated 2 weeks ago
microsoft / BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
☆708Updated 3 months ago
Deep-Learning-Profiling-Tools / triton-viz
☆247Updated this week
Shenggan / awesome-distributed-ml
A curated list of awesome projects and papers for distributed training or inference
☆250Updated last year
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆303Updated 2 months ago
cli99 / llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
☆461Updated 6 months ago
tlc-pack / relax
☆193Updated 2 years ago
Azure / MS-AMP
Microsoft Automatic Mixed Precision Library
☆626Updated last year
Dao-AILab / quack
A Quirky Assortment of CuTe Kernels
☆653Updated 2 weeks ago
fpgaminer / GPTQ-triton
GPTQ inference Triton kernel
☆313Updated 2 years ago
gpu-mode / triton-index
Cataloging released Triton kernels.
☆265Updated 2 months ago
d2l-ai / d2l-tvm
Dive into Deep Learning Compiler
☆646Updated 3 years ago
triton-lang / triton-cpu
An experimental CPU backend for Triton
☆160Updated this week
awslabs / raf
☆145Updated 9 months ago
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆306Updated 2 weeks ago
sail-sg / zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
☆433Updated 6 months ago
awslabs / slapo
A schedule language for large model training
☆151Updated 2 months ago
IST-DASLab / marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
☆946Updated last year
ByteDance-Seed / Triton-distributed
Distributed Compiler based on Triton for Parallel Systems
☆1,233Updated this week
HazyResearch / Megakernels
kernels, of the mega variety
☆599Updated last month
mlc-ai / mlc-zh
☆619Updated last year
meta-pytorch / tritonparse
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
☆169Updated last week
ColfaxResearch / cutlass-kernels
☆243Updated last year
perplexityai / pplx-kernels
Perplexity GPU Kernels
☆528Updated last week
mit-han-lab / omniserve
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…
☆778Updated 8 months ago
KnowingNothing / MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
☆390Updated last month
openxla / stablehlo
Backward compatible ML compute opset inspired by HLO/MHLO
☆565Updated last week
bytedance / byteir
A model compilation solution for various hardware
☆454Updated 2 months ago