Said-Akbar / triton-gcn5Links
Triton for AMD MI25/50/60. Development repository for the Triton language and compiler
☆26Updated 2 months ago
Alternatives and similar repositories for triton-gcn5
Users that are interested in triton-gcn5 are comparing it to the libraries listed below
Sorting:
- FORK of VLLM for AMD MI25/50/60. A high-throughput and memory-efficient inference and serving engine for LLMs☆49Updated last month
- vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60☆63Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆79Updated this week
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆132Updated this week
- ☆36Updated this week
- Fast and memory-efficient exact attention☆173Updated this week
- ROCm Library Files for gfx1103 and update with others arches based on AMD GPUs for use in Windows.☆514Updated 4 months ago
- 8-bit CUDA functions for PyTorch Rocm compatible☆41Updated last year
- Make PyTorch models at least run on APUs.☆55Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)☆42Updated last week
- 8-bit CUDA functions for PyTorch☆53Updated 3 weeks ago
- 8-bit CUDA functions for PyTorch, ported to HIP for use in AMD GPUs☆49Updated 2 years ago
- NVIDIA Linux open GPU with P2P support☆25Updated 2 weeks ago
- Easily deploy your rwkv model☆17Updated 2 years ago
- A lightweight cluster manager that turns your small fleet of nodes into one powerful computer, using Docker for environment consistency w…☆50Updated 2 weeks ago
- Linux based GDDR6/GDDR6X VRAM temperature reader for NVIDIA RTX 3000/4000 series GPUs.☆98Updated last month
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆86Updated this week
- RWKV models and examples powered by candle.☆18Updated 3 months ago
- Running SXM2/SXM3/SXM4 NVidia data center GPUs in consumer PCs☆109Updated last year
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆43Updated 9 months ago
- A Python package for extending the official PyTorch that can easily obtain performance on Intel platform☆47Updated 5 months ago
- Make abliterated models with transformers, easy and fast☆73Updated last month
- Development repository for the Triton language and compiler☆122Updated this week
- llama.cpp fork with additional SOTA quants and improved performance☆548Updated this week
- automatically quant GGUF models☆181Updated this week
- Prebuilt Windows ROCm Libs for gfx1031 and gfx1032☆139Updated 2 months ago
- A converter and basic tester for rwkv onnx☆41Updated last year
- ☆43Updated last year
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆67Updated 6 months ago
- CUDA on AMD GPUs☆502Updated 3 weeks ago