vedantroy / gpu_kernels
☆22Updated 10 months ago
Related projects ⓘ
Alternatives and complementary repositories for gpu_kernels
- Boosting 4-bit inference kernels with 2:4 Sparsity☆51Updated 2 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆34Updated 8 months ago
- FlexAttention w/ FlashAttention3 Support☆27Updated last month
- Odysseus: Playground of LLM Sequence Parallelism☆57Updated 5 months ago
- Transformers components but in Triton☆27Updated last week
- PyTorch bindings for CUTLASS grouped GEMM.☆53Updated 3 weeks ago
- Simple and fast low-bit matmul kernels in CUDA / Triton☆147Updated this week
- llama INT4 cuda inference with AWQ☆48Updated 4 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆38Updated 10 months ago
- ☆47Updated 2 months ago
- ☆35Updated this week
- Quantized Attention on GPU☆31Updated this week
- Collection of kernels written in Triton language☆69Updated 3 weeks ago
- ☆22Updated 11 months ago
- ☆49Updated 2 weeks ago
- ☆55Updated 6 months ago
- GPTQ inference TVM kernel☆36Updated 7 months ago
- A minimal implementation of vllm.☆30Updated 3 months ago
- ☆90Updated 2 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆87Updated last month
- ☆23Updated 4 months ago
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference☆61Updated last month
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆56Updated last month
- TensorRT LLM Benchmark Configuration☆11Updated 3 months ago
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆20Updated 5 months ago
- Code for Palu: Compressing KV-Cache with Low-Rank Projection☆57Updated last week
- ☆96Updated 2 months ago
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆83Updated 3 months ago
- ☆34Updated 9 months ago