maxas Scott Grey's maxas assembler sgemm explaining the (for me) missing parts https://github.com/NervanaSystems/maxas
☆17Dec 22, 2018Updated 7 years ago
Alternatives and similar repositories for maxas-explained
Users that are interested in maxas-explained are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implement asm gemm on vega64 for 4096x4096 fp32 matrix☆22Oct 12, 2019Updated 6 years ago
- ☆29Jan 17, 2025Updated last year
- ☆24Feb 1, 2012Updated 14 years ago
- assembler for NVIDIA FERMI. Imported from Google Code☆77Mar 22, 2015Updated 11 years ago
- A pattern-based algorithmic autotuner for graph processing on GPUs.☆33Jun 25, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Stub for polymorphic code☆11Mar 18, 2023Updated 3 years ago
- Tools to measure an app's App Sandbox usage☆26May 20, 2020Updated 6 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆35Jul 28, 2020Updated 5 years ago
- terraform meets nix☆21Sep 7, 2025Updated 9 months ago
- ☆23Jan 27, 2014Updated 12 years ago
- Validates steam_api DLL's and acts a Unity game launcher. It's a dev tool, you know what use you'd have for it.☆13Oct 25, 2020Updated 5 years ago
- ☆12Feb 7, 2013Updated 13 years ago
- GPU Performance Advisor☆66Jul 25, 2022Updated 3 years ago
- Sequential and parallel GEMM implementations with C interface + Benchmark.☆12May 24, 2016Updated 10 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- book for Halide language programming☆13Sep 8, 2021Updated 4 years ago
- New batched algorithm for sparse matrix-matrix multiplication (SpMM)☆16May 7, 2019Updated 7 years ago
- I moved this folder. Keeping this repo up for archival purposes only.☆17Jun 5, 2024Updated 2 years ago
- A GPU performance prediction toolkit for CUDA programs☆18Mar 25, 2019Updated 7 years ago
- Distributed machine learning platform☆13Aug 20, 2015Updated 10 years ago
- ☆17Jul 1, 2020Updated 6 years ago
- Far Cry is a first-person shooter (FPS) video game with amazing graphics, developed by Crytek and published by Ubisoft.☆13May 28, 2019Updated 7 years ago
- ☆13Nov 15, 2022Updated 3 years ago
- Complete solution to enable RDMA (on both InfiniBand and RoCE) and accelerate TCP to bare metal performance on Kubernetes☆11Aug 1, 2018Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Solutions to Flare-On 10 CTF☆14Nov 11, 2023Updated 2 years ago
- ☆21Oct 6, 2021Updated 4 years ago
- Scala staging framework☆18Jul 13, 2018Updated 7 years ago
- A PyTorch implementation of "Self-Supervised GNN that Jointly Learns to Augment" or "Jointly Learnable Data Augmentations for Self-Superv…☆13Dec 13, 2021Updated 4 years ago
- nix to bazel-re proxy☆27Oct 1, 2024Updated last year
- Small set of gdb commands for useful tasks in tvm☆22Jul 10, 2025Updated 11 months ago
- A modified version of Andrej Karpathy's build-nanogpt☆37Oct 26, 2025Updated 8 months ago
- Exported, Nix-based monorepo tooling from TVL. In use for our repo at https://code.tvl.fyi☆30Updated this week
- Frobenius Additive Fourier Transform☆13Jan 22, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Code for reproducing the results in "Forecasting Human Dynamics from Static Images"☆13Jun 16, 2024Updated 2 years ago
- Extract information from macOS about the hardware it supports☆30Feb 14, 2022Updated 4 years ago
- bhSPARSE: A Sparse BLAS Library☆17Nov 6, 2015Updated 10 years ago
- read / write memory from a proxy process by injecting shellcode☆20Dec 23, 2025Updated 6 months ago
- Reviving the old comp-arch.net wiki?☆18Jun 21, 2023Updated 3 years ago
- Zero-Shot Translation implemented by Transformer☆14Mar 24, 2023Updated 3 years ago
- BLAS OpenCL implementation.☆17Apr 8, 2015Updated 11 years ago