Said-Akbar / triton-gcn5
Triton for AMD MI25/50/60. Development repository for the Triton language and compiler
☆14Updated 2 weeks ago
Alternatives and similar repositories for triton-gcn5:
Users that are interested in triton-gcn5 are comparing it to the libraries listed below
- FORK of VLLM for AMD MI25/50/60. A high-throughput and memory-efficient inference and serving engine for LLMs☆23Updated 3 weeks ago
- Fast and memory-efficient exact attention☆163Updated this week
- run DeepSeek-R1 GGUFs on KTransformers☆212Updated 3 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆69Updated this week
- llama.cpp fork with additional SOTA quants and improved performance☆231Updated this week
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆37Updated 7 months ago
- Make abliterated models with transformers, easy and fast☆64Updated last week
- A Docker image based on rocm/pytorch with support for gfx803(Polaris 20-21 (XT/PRO/XL); RX580; RX570; RX560) and Python 3.8☆23Updated last year
- Implementation of the RWKV language model in pure WebGPU/Rust.☆297Updated this week
- Inference RWKV with multiple supported backends.☆39Updated this week
- 4 bits quantization of LLMs using GPTQ☆48Updated last year
- Development repository for the Triton language and compiler☆114Updated this week
- Croco.Cpp is a 3rd party testground for KoboldCPP, a simple one-file way to run various GGML/GGUF models with KoboldAI's UI. (for Croco.C…☆100Updated last week
- ☆43Updated last year
- Fine-tuning RWKV-World model☆25Updated last year
- Fork of the Triton language and compiler for Windows support and easy installation☆755Updated this week
- Example code and documentation on how to get Stable Diffusion running with ONNX FP16 models on DirectML. Can run accelerated on all Direc…☆299Updated last year
- Next generation BLAS implementation for ROCm platform☆361Updated this week
- Simple monkeypatch to boost AMD Navi 3 GPUs☆36Updated 10 months ago
- Multi AMD GPU Setup for AI Development on Ubuntu with ROCM☆26Updated last week
- 8-bit CUDA functions for PyTorch, ported to HIP for use in AMD GPUs☆49Updated last year
- My personal fork of koboldcpp where I hack in experimental samplers.☆44Updated 10 months ago
- 8-bit CUDA functions for PyTorch☆46Updated last month
- Implements harmful/harmless refusal removal using pure HF Transformers☆709Updated 9 months ago
- The all-in-one RWKV runtime box with embed, RAG, AI agents, and more.☆548Updated 3 weeks ago
- CUDA on AMD GPUs☆422Updated last week
- 8-bit CUDA functions for PyTorch Rocm compatible☆39Updated last year
- automatically quant GGUF models☆164Updated this week
- AI Inferencing at the Edge. A simple one-file way to run various GGML models with KoboldAI's UI with AMD ROCm offloading☆582Updated last week
- A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…☆310Updated last year