AMD-AGI/Primus-Turbo

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AMD-AGI/Primus-Turbo)

AMD-AGI / Primus-Turbo

A high-performance acceleration library dedicated to large-scale model training on AMD GPUs

☆67

Alternatives and similar repositories for Primus-Turbo

Users that are interested in Primus-Turbo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AMD-AGI / Primus-SaFE
View on GitHub
Primus-SaFE(Stability and Fault Endurance)
☆58Updated this week
AMD-AGI / Primus
View on GitHub
A flexible and high-performance training framework designed for large-scale foundation model training on AMD GPUs
☆108Updated this week
AMD-AGI / maxtext-slurm
View on GitHub
Toolkit for launching and observing MaxText training on Slurm-managed GPU clusters
☆29Jul 19, 2026Updated last week
ROCm / TransformerEngine
View on GitHub
☆72Updated this week
AMD-AGI / torchtitan-amd
View on GitHub
A PyTorch native platform for training generative AI models
☆17Jun 30, 2026Updated 3 weeks ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ROCm / DeepEP
View on GitHub
☆15Jun 30, 2026Updated 3 weeks ago
AMD-AGI / TraceLens
View on GitHub
Automating analysis from trace files
☆84Updated this week
ROCm / Megatron-LM
View on GitHub
Ongoing research training transformer models at scale
☆43Updated this week
ROCm / aiter
View on GitHub
AI Tensor Engine for ROCm
☆503Updated this week
ROCm / iris
View on GitHub
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
☆193Updated this week
carlushuang / gcnasm
View on GitHub
amdgpu example code in hip/asm
☆66Updated this week
ROCm / rocSHMEM
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆146Updated this week
ROCm / mori
View on GitHub
Modular RDMA Interface
☆157Updated this week
ROCm / MAD
View on GitHub
MAD (Model Automation and Dashboarding)
☆39Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
WaveSpeedAI / QuantumAttention
View on GitHub
[WIP] Better (FP8) attention for Hopper
☆33Feb 24, 2025Updated last year
Zyphra / zcookbook
View on GitHub
Training hybrid models for dummies.
☆31Nov 1, 2025Updated 8 months ago
HazyResearch / HipKittens
View on GitHub
Fast and Furious AMD Kernels
☆446Jul 10, 2026Updated 2 weeks ago
aflah02 / TokenSmith
View on GitHub
A comprehensive toolkit for streamlining data editing, search, and inspection for large-scale language model training and interpretabilit…
☆21Oct 30, 2025Updated 8 months ago
mk1-project / quickreduce
View on GitHub
QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.
☆38Aug 29, 2025Updated 10 months ago
uccl-project / mKernel
View on GitHub
mKernel: fast multi-node, multi-GPU fused kernels
☆255Jun 21, 2026Updated last month
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago
Bruce-Lee-LY / cuda_auto_tune
View on GitHub
NCU-driven iterative optimization workflow for CUDA/CUTLASS/Triton/CuTe DSL kernels.
☆23Apr 10, 2026Updated 3 months ago
ROCm / FlyDSL
View on GitHub
FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.
☆249Updated this week
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
AMD-AGI / GEAK
View on GitHub
Generating Efficient AI-Centric Kernels
☆131Updated this week
ROCm / gfx950-gluon-tutorials
View on GitHub
A practical guide to high-performance gluon kernel development on AMD GFX9 GPUs.
☆41Updated this week
AMD-AGI / Apex
View on GitHub
Agents, and RL environment, for optimizing GPU kernels on AMD ROCm using LLM agents. Benchmarks LLM serving workloads end-to-end, profile…
☆71Updated this week
ROCm / ATOM
View on GitHub
AiTer Optimized Model
☆144Updated this week
onnx / steering-committee
View on GitHub
Notes and artifacts from the ONNX steering committee
☆29Updated this week
ROCm / omnistat
View on GitHub
Scale-out system monitoring
☆25Updated this week
huggingface / hf-rocm-kernels
View on GitHub
☆24May 26, 2026Updated 2 months ago
meta-pytorch / torchcomms
View on GitHub
torchcomms: a modern PyTorch communications API
☆380Updated this week
ROCm / composable_kernel
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror
☆539Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
deepspeedai / deepspeed-gpt-neox
View on GitHub
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
☆21Nov 28, 2022Updated 3 years ago
nod-ai / ossci-fleet
View on GitHub
The goal of the OSSCI Fleet is to provide a central mechanism to enable test automation, batch job scheduling, and developer access to a …
☆13Apr 28, 2026Updated 2 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,498Updated this week
ROCm / amd_matrix_instruction_calculator
View on GitHub
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
☆140Apr 10, 2026Updated 3 months ago
ROCm / RIXL
View on GitHub
DEPRECATED REPOSITORY. ROCm Inference Transfer Library (RIXL) is a port of the NIXL library for AMD GPUs. See README_rocm.md for AMD spe…
☆15Jun 10, 2026Updated last month
meta-pytorch / triton-cpu
View on GitHub
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆48Aug 18, 2025Updated 11 months ago
ROCm / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆234Jul 16, 2026Updated last week