☆64Mar 25, 2026Updated this week
Alternatives and similar repositories for Primus-Turbo
Users that are interested in Primus-Turbo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆82Updated this week
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 7 months ago
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆182Mar 20, 2026Updated last week
- [WIP] Better (FP8) attention for Hopper☆32Feb 24, 2025Updated last year
- A PyTorch native platform for training generative AI models☆16Nov 18, 2025Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Primus-SaFE(Stability and Fault Endurance)☆53Mar 20, 2026Updated last week
- Markovian State and Action Abstractions for MDPs via Hierarchical MCTS within a POMDP Formulation☆11Jul 26, 2016Updated 9 years ago
- ☆18Apr 16, 2025Updated 11 months ago
- paNote: an graph note software can be deployed as blog or use as electron☆12Jun 15, 2024Updated last year
- An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.☆21Nov 28, 2022Updated 3 years ago
- The goal of the OSSCI Fleet is to provide a central mechanism to enable test automation, batch job scheduling, and developer access to a …☆13Feb 27, 2026Updated 3 weeks ago
- some mixture of experts architecture implementations☆26Mar 22, 2024Updated 2 years ago
- ☆19Feb 25, 2026Updated last month
- Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.☆84Updated this week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- The official implementation of ImageBind-LLM and Whisper-LLM from the paper "Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Compre…☆21Oct 30, 2023Updated 2 years ago
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…☆94Sep 11, 2025Updated 6 months ago
- Crawled Wikipedia Tables with Passages☆13Aug 19, 2021Updated 4 years ago
- ☆64Updated this week
- Project showing how to develop NKI kernels for Llama 3.2 1B inference☆21May 29, 2025Updated 9 months ago
- ☆20Aug 20, 2025Updated 7 months ago
- Notes and artifacts from the ONNX steering committee☆28Updated this week
- Slides and other materials for club meetings☆17Jun 26, 2022Updated 3 years ago
- KsanaDiT: High-Performance DiT (Diffusion Transformer) Inference Framework for Video & Image Generation☆46Mar 6, 2026Updated 3 weeks ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- ☆65Apr 26, 2025Updated 11 months ago
- Random collections of my interested research papers / projects☆20May 20, 2021Updated 4 years ago
- ☆12Apr 9, 2025Updated 11 months ago
- Compact LSTM inference kernel (CLINK) designed in C/HLS for FPGA implementation.☆17Oct 7, 2019Updated 6 years ago
- Ongoing research training transformer models at scale☆39Updated this week
- Small shim that allows AWS Cognito to talk to github (by providing an OpenID wrapper around the Github API)☆15Feb 2, 2023Updated 3 years ago
- An efficient implementation of learned optimizers in PyTorch☆45Dec 2, 2025Updated 3 months ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆57Jul 23, 2024Updated last year
- Open Character Training☆78Nov 24, 2025Updated 4 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆38Aug 29, 2025Updated 6 months ago
- Triton Implementation of HyperAttention Algorithm☆48Dec 11, 2023Updated 2 years ago
- Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…☆27Dec 10, 2022Updated 3 years ago
- [ICLR2026] The code for "Interp3D: Correspondence-Aware Interpolation for Generative Textured 3D Morphing."☆26Jan 21, 2026Updated 2 months ago
- ☆15Aug 15, 2024Updated last year
- Tutorials for NVIDIA CUPTI samples☆61Nov 3, 2025Updated 4 months ago
- Benchmark Suite for Heterogenuous FFT Implementations☆35Jan 6, 2024Updated 2 years ago