rogerallen / llama2.cu
Inference Llama 2 in one file of pure C & one file with CUDA
☆16Updated last year
Related projects ⓘ
Alternatives and complementary repositories for llama2.cu
- LLM training in simple, raw C/CUDA☆86Updated 6 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆35Updated 6 months ago
- ☆46Updated last month
- OpenAI Triton backend for Intel® GPUs☆143Updated this week
- ☆145Updated this week
- ☆96Updated last month
- ☆114Updated 6 months ago
- ☆47Updated 2 weeks ago
- Applied AI experiments and examples for PyTorch☆160Updated last week
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆95Updated this week
- ☆162Updated 4 months ago
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆196Updated 2 weeks ago
- Simple and fast low-bit matmul kernels in CUDA / Triton☆140Updated this week
- Ahead of Time (AOT) Triton Math Library☆40Updated this week
- IREE's PyTorch Frontend, based on Torch Dynamo.☆53Updated this week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆87Updated 4 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆98Updated 2 months ago
- llama INT4 cuda inference with AWQ☆47Updated 4 months ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆123Updated last year
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆410Updated this week
- Cataloging released Triton kernels.☆133Updated 2 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆46Updated this week
- ☆156Updated last month
- Materials for learning SGLang☆86Updated this week
- Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for t…☆245Updated this week
- extensible collectives library in triton☆65Updated last month
- Collection of kernels written in Triton language☆63Updated 2 weeks ago
- Shared Middle-Layer for Triton Compilation☆188Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆268Updated this week
- ☆83Updated 5 months ago
- Development repository for the Triton language and compiler☆93Updated this week