mlc-ai / mlc-enLinks
☆452Updated this week
Alternatives and similar repositories for mlc-en
Users that are interested in mlc-en are comparing it to the libraries listed below
Sorting:
- ☆218Updated last year
- An open-source efficient deep learning framework/compiler, written in python.☆739Updated 4 months ago
- Fast low-bit matmul kernels in Triton☆423Updated 3 weeks ago
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆741Updated 5 months ago
- A curated list of awesome projects and papers for distributed training or inference☆263Updated last year
- GPTQ inference Triton kernel☆316Updated 2 years ago
- An experimental CPU backend for Triton☆170Updated 2 months ago
- Latency and Memory Analysis of Transformer Models for Training and Inference☆475Updated 8 months ago
- ☆271Updated this week
- ☆623Updated 3 weeks ago
- ☆192Updated 2 years ago
- A model compilation solution for various hardware☆461Updated 4 months ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆802Updated 10 months ago
- AI and Memory Wall☆225Updated last year
- A Quirky Assortment of CuTe Kernels☆749Updated this week
- Perplexity GPU Kernels☆552Updated 2 months ago
- Collection of kernels written in Triton language☆174Updated 9 months ago
- Shared Middle-Layer for Triton Compilation☆323Updated last month
- A Easy-to-understand TensorOp Matmul Tutorial☆404Updated last week
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆476Updated last year
- Zero Bubble Pipeline Parallelism☆447Updated 8 months ago
- Applied AI experiments and examples for PyTorch☆312Updated 4 months ago
- A collection of memory efficient attention operators implemented in the Triton language.☆287Updated last year
- Cataloging released Triton kernels.☆282Updated 4 months ago
- ☆256Updated last year
- A library to analyze PyTorch traces.☆456Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆508Updated this week
- Backward compatible ML compute opset inspired by HLO/MHLO☆590Updated last week
- ☆145Updated 11 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆310Updated this week