mlc-ai / xgrammar
☆22Updated this week
Related projects ⓘ
Alternatives and complementary repositories for xgrammar
- Debug print operator for cudagraph debugging☆10Updated 3 months ago
- ☆43Updated last week
- GPTQ inference TVM kernel☆35Updated 6 months ago
- ☆18Updated last month
- Quantized Attention on GPU☆29Updated last week
- PyTorch bindings for CUTLASS grouped GEMM.☆53Updated last week
- Triton to TVM transpiler.☆16Updated last month
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆111Updated last month
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆85Updated 8 months ago
- ☆31Updated 5 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆98Updated 2 months ago
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆49Updated 3 months ago
- TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.☆151Updated last week
- An experimental CPU backend for Triton☆56Updated this week
- A sparse attention kernel supporting mix sparse patterns☆53Updated 3 weeks ago
- Materials for learning SGLang☆86Updated this week
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆32Updated 3 months ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated 6 months ago
- The documents for TVM Unity☆11Updated 3 months ago
- TensorRT LLM Benchmark Configuration☆11Updated 3 months ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆75Updated last week
- Optimize tensor program fast with Felix, a gradient descent autotuner.☆19Updated 6 months ago
- extensible collectives library in triton☆65Updated last month
- ThrillerFlow is a Dataflow Analysis and Codegen Framework written in Rust.☆10Updated last month
- High performance Transformer implementation in C++.☆80Updated 2 months ago
- llama INT4 cuda inference with AWQ☆47Updated 4 months ago
- Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization