mlc-ai / mlc-python
☆11Updated this week
Related projects ⓘ
Alternatives and complementary repositories for mlc-python
- GPTQ inference TVM kernel☆36Updated 6 months ago
- An Attention Superoptimizer☆20Updated 6 months ago
- ☆11Updated 3 years ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆15Updated this week
- Efficient, Flexible and Portable Structured Generation☆53Updated this week
- An external memory allocator example for PyTorch.☆13Updated 3 years ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆19Updated last year
- ☆18Updated last month
- TensorRT LLM Benchmark Configuration☆11Updated 3 months ago
- Cavs: An Efficient Runtime System for Dynamic Neural Networks☆13Updated 4 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆17Updated 2 years ago
- Debug print operator for cudagraph debugging☆10Updated 3 months ago
- ☆9Updated last year
- Optimize tensor program fast with Felix, a gradient descent autotuner.☆19Updated 6 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆35Updated 6 months ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆14Updated 5 years ago
- ☆47Updated 2 months ago
- Explore training for quantized models☆10Updated last week
- ☆34Updated this week
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆19Updated 8 months ago
- Quantized Attention on GPU☆30Updated 2 weeks ago
- A minimal implementation of vllm.☆30Updated 3 months ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆85Updated 8 months ago
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆44Updated 5 months ago
- FlexAttention w/ FlashAttention3 Support☆27Updated last month
- llama INT4 cuda inference with AWQ☆48Updated 4 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆98Updated 2 months ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆20Updated last week