mlc-ai / relax
☆152Updated this week
Related projects ⓘ
Alternatives and complementary repositories for relax
- Standalone Flash Attention v2 kernel without libtorch dependency☆98Updated 2 months ago
- ☆140Updated 6 months ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆85Updated 8 months ago
- ☆138Updated 2 weeks ago
- ☆123Updated 2 weeks ago
- ☆114Updated 7 months ago
- ☆57Updated 2 weeks ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆90Updated 4 months ago
- play gemm with tvm☆84Updated last year
- ☆190Updated 2 months ago
- llama INT4 cuda inference with AWQ☆48Updated 4 months ago
- A home for the final text of all TVM RFCs.☆101Updated last month
- ☆79Updated 8 months ago
- Shared Middle-Layer for Triton Compilation☆191Updated this week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆209Updated 3 weeks ago
- Efficient, Flexible and Portable Structured Generation☆53Updated this week
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆29Updated 2 months ago
- LLaMa/RWKV onnx models, quantization and testcase☆353Updated last year
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆206Updated this week
- OpenAI Triton backend for Intel® GPUs☆143Updated this week
- An experimental CPU backend for Triton☆56Updated last week
- TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.☆156Updated this week
- ☆196Updated last year
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆95Updated this week
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆100Updated 11 months ago
- ☆169Updated 4 months ago
- A model compilation solution for various hardware☆378Updated this week
- ☆79Updated 2 months ago
- Play with MLIR right in your browser☆124Updated last year
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆50Updated 2 months ago