Said-Akbar / triton-gcn5Links
Triton for AMD MI25/50/60. Development repository for the Triton language and compiler
☆32Updated 2 weeks ago
Alternatives and similar repositories for triton-gcn5
Users that are interested in triton-gcn5 are comparing it to the libraries listed below
Sorting:
- FORK of VLLM for AMD MI25/50/60. A high-throughput and memory-efficient inference and serving engine for LLMs☆65Updated 7 months ago
- vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60☆338Updated this week
- NVIDIA Linux open GPU with P2P support☆94Updated last week
- ☆62Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMs☆111Updated this week
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆613Updated this week
- ☆157Updated 3 weeks ago
- Reinforcement Learning Toolkit for RWKV.(v6,v7,ARWKV) Distillation,SFT,RLHF(DPO,ORPO), infinite context training, Aligning. Exploring the…☆56Updated 2 months ago
- The all-in-one RWKV runtime box with embed, RAG, AI agents, and more.☆589Updated last month
- Fast and memory-efficient exact attention☆202Updated this week
- automatically quant GGUF models☆219Updated last month
- AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1☆216Updated 2 weeks ago
- This project is established for real-time training of the RWKV model.☆50Updated last year
- 8-bit CUDA functions for PyTorch☆69Updated 2 months ago
- A guide to Intel Arc-enabled (maybe) version of @AUTOMATIC1111/stable-diffusion-webui☆55Updated 2 years ago
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆48Updated last year
- ROCm Library Files for gfx1103 and update with others arches based on AMD GPUs for use in Windows.☆708Updated 2 months ago
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆118Updated this week
- LM inference server implementation based on *.cpp.☆293Updated 2 weeks ago
- A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…☆313Updated last year
- Fresh builds of llama.cpp with AMD ROCm™ 7 acceleration☆129Updated this week
- Example code and documentation on how to get Stable Diffusion running with ONNX FP16 models on DirectML. Can run accelerated on all Direc…☆301Updated 2 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)☆253Updated 2 weeks ago
- llama.cpp fork with additional SOTA quants and improved performance☆1,387Updated this week
- A converter and basic tester for rwkv onnx☆43Updated last year
- Input your VRAM and RAM and the toolchain will produce a GGUF model tuned to your system within seconds — flexible model sizing and lowes…☆66Updated this week
- run DeepSeek-R1 GGUFs on KTransformers☆258Updated 9 months ago
- RAG SYSTEM FOR RWKV☆50Updated last year
- Running SXM2/SXM3/SXM4 NVidia data center GPUs in consumer PCs☆132Updated 2 years ago
- Inference RWKV with multiple supported backends.☆70Updated last week