sgl-project / tensorrt-demo
TensorRT LLM Benchmark Configuration
☆13Updated 6 months ago
Alternatives and similar repositories for tensorrt-demo:
Users that are interested in tensorrt-demo are comparing it to the libraries listed below
- TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.☆56Updated this week
- GPTQ inference TVM kernel☆38Updated 9 months ago
- Benchmark tests supporting the TiledCUDA library.☆15Updated 3 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated 2 weeks ago
- Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.☆29Updated 3 months ago
- ☆19Updated 4 months ago
- Quantized Attention on GPU☆34Updated 2 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆64Updated 8 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆61Updated 3 weeks ago
- ☆23Updated 2 months ago
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆22Updated 8 months ago
- A minimal implementation of vllm.☆33Updated 6 months ago
- GPU operators for sparse tensor operations☆30Updated 11 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆64Updated 3 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆89Updated this week
- An Attention Superoptimizer☆21Updated last month
- Summary of system papers/frameworks/codes/tools on training or serving large model☆56Updated last year
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆38Updated 11 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆104Updated 5 months ago
- ☆67Updated 2 months ago
- ☆21Updated last week
- Transformers components but in Triton☆31Updated 3 months ago
- ☆11Updated last year
- ☆81Updated 5 months ago
- ☆36Updated last month
- Debug print operator for cudagraph debugging☆10Updated 6 months ago