ppl-ai / pplx-kernelsLinks
Perplexity GPU Kernels
☆375Updated 2 weeks ago
Alternatives and similar repositories for pplx-kernels
Users that are interested in pplx-kernels are comparing it to the libraries listed below
Sorting:
- Dynamic Memory Management for Serving LLMs without PagedAttention☆397Updated 3 weeks ago
- Distributed Compiler Based on Triton for Parallel Systems☆846Updated last week
- A low-latency & high-throughput serving engine for LLMs☆380Updated 3 weeks ago
- Zero Bubble Pipeline Parallelism☆398Updated last month
- NVIDIA Inference Xfer Library (NIXL)☆422Updated this week
- Efficient and easy multi-instance LLM serving☆437Updated this week
- Fastest kernels written from scratch☆281Updated 2 months ago
- Fast low-bit matmul kernels in Triton☆322Updated last week
- A lightweight design for computation-communication overlap.☆143Updated this week
- Materials for learning SGLang☆443Updated last week
- DeepSeek-V3/R1 inference performance simulator☆149Updated 2 months ago
- ☆212Updated 11 months ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆705Updated 3 months ago
- ☆90Updated 5 months ago
- Applied AI experiments and examples for PyTorch☆277Updated 3 weeks ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆379Updated this week
- Ultra and Unified CCL☆165Updated this week
- kernels, of the mega variety☆390Updated 3 weeks ago
- Cataloging released Triton kernels.☆238Updated 5 months ago
- KV cache store for distributed LLM inference☆269Updated 2 weeks ago
- Disaggregated serving system for Large Language Models (LLMs).☆617Updated 2 months ago
- LLM KV cache compression made easy☆520Updated this week
- High performance Transformer implementation in C++.☆125Updated 5 months ago
- A throughput-oriented high-performance serving framework for LLMs☆825Updated 3 weeks ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆425Updated 3 weeks ago
- A large-scale simulation framework for LLM inference☆387Updated 7 months ago
- nnScaler: Compiling DNN models for Parallel Training☆113Updated this week
- flash attention tutorial written in python, triton, cuda, cutlass☆377Updated last month
- A Easy-to-understand TensorOp Matmul Tutorial☆364Updated 9 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆255Updated 3 months ago