fw-ai / llama-cuda-graph-exampleLinks
Example of applying CUDA graphs to LLaMA-v2
☆12Updated 2 years ago
Alternatives and similar repositories for llama-cuda-graph-example
Users that are interested in llama-cuda-graph-example are comparing it to the libraries listed below
Sorting:
- ☆71Updated 8 months ago
- Ship correct and fast LLM kernels to PyTorch☆125Updated 3 weeks ago
- ☆113Updated last year
- ☆27Updated last year
- extensible collectives library in triton☆91Updated 8 months ago
- ☆14Updated last month
- Boosting 4-bit inference kernels with 2:4 Sparsity☆86Updated last year
- Triton-based Symmetric Memory operators and examples☆65Updated last month
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆45Updated last year
- ring-attention experiments