sgl-project / tensorrt-demo
TensorRT LLM Benchmark Configuration
☆13Updated 7 months ago
Alternatives and similar repositories for tensorrt-demo:
Users that are interested in tensorrt-demo are comparing it to the libraries listed below
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated this week
- GPTQ inference TVM kernel☆39Updated 10 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing. By pro…☆68Updated this week
- Benchmark tests supporting the TiledCUDA library.☆15Updated 4 months ago
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆35Updated 2 weeks ago
- Quantized Attention on GPU☆45Updated 4 months ago
- ☆19Updated 5 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆66Updated 9 months ago
- ☆24Updated 3 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆63Updated this week
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆59Updated 2 weeks ago
- ☆88Updated 6 months ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆17Updated 5 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆74Updated 4 months ago
- ☆26Updated this week
- Summary of system papers/frameworks/codes/tools on training or serving large model☆56Updated last year
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆30Updated 3 weeks ago
- ☆46Updated 2 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆106Updated 6 months ago
- ☆64Updated 2 months ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆89Updated 3 weeks ago
- GPU operators for sparse tensor operations☆31Updated last year
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆23Updated last month