☆20Nov 7, 2019Updated 6 years ago
Alternatives and similar repositories for NVIDIA-tensor-core-examples
Users that are interested in NVIDIA-tensor-core-examples are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆37Sep 15, 2023Updated 2 years ago
- ☆11Apr 10, 2019Updated 7 years ago
- ☆34Apr 2, 2025Updated last year
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Jan 25, 2025Updated last year
- This repository mirrors the principal Gitlab repository of the Chebyshev Accelerated Subspace iteration Eigensolver. If you want to contr…☆19May 5, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- FPGA-based HyperLogLog Accelerator☆12Jul 13, 2020Updated 5 years ago
- AnacondaCON 2019 GPU Deep Learning Tutorial☆16Updated this week
- Simple example of how to write an Implicit GEMM Convolution in CUDA using the tensor core WMMA API and bindings for PyTorch.☆19Jun 29, 2023Updated 3 years ago
- ☆28Updated this week
- 2D and 3D Matrix Convolution and Matrix Multiplication with CUDA☆10Jun 14, 2021Updated 5 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆80Mar 27, 2023Updated 3 years ago
- Benchmarks to capture important workloads.☆33Apr 1, 2026Updated 2 months ago
- A scalable implementation of the multifrontal method for symmetric and Hermitian systems (with intrafrontal pivoting)☆19Jun 27, 2016Updated 10 years ago
- NVIDIA Performance Libraries: Sample code☆23May 28, 2026Updated last month
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- An OpenMP runtime implemented using HPX☆25Aug 4, 2022Updated 3 years ago
- ☆108May 31, 2025Updated last year
- GPU-accelerated AES encryption project☆11Feb 13, 2015Updated 11 years ago
- Official implementation of Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores.☆17Nov 13, 2025Updated 7 months ago
- Stencil with Optimized Dataflow Architecture☆12Feb 27, 2024Updated 2 years ago
- Sample repo for blog post about using local Maven repo☆14Apr 4, 2024Updated 2 years ago
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆23Jan 11, 2024Updated 2 years ago
- study of Ampere' Sparse Matmul☆18Jan 10, 2021Updated 5 years ago
- Distributed-memory, double-precision, polar decomposition (QDWH/ZOLO-PD) of a dense matrix, svd (QDWH/ZOLOPD-SVD) of a dense matrix☆14Jun 3, 2020Updated 6 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆14Jun 6, 2022Updated 4 years ago
- JUBE benchmarking environment configuration files☆10Oct 1, 2015Updated 10 years ago
- Simple problems implemented in CUDA C☆39Apr 7, 2025Updated last year
- Experimental Linear Algebra Performance Studies☆12Feb 24, 2017Updated 9 years ago
- Memory footprint reduction for transformer models☆11Jan 24, 2023Updated 3 years ago
- 收录SC小组在学习高性能计算、分布式架构、数据挖掘与人工智能方向的笔记和材料☆15Oct 29, 2021Updated 4 years ago
- ☆11Aug 4, 2022Updated 3 years ago
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- ☆27Nov 20, 2025Updated 7 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- MiniFE Finite Element Mini-Application☆41May 13, 2026Updated last month
- Multidimensional arrays for C++. (Not an official Boost library) \\ This is a mirror of gitlab.com/correaa/boost-multi☆20Updated this week
- Auto-differentiation library for C++☆12Jan 16, 2022Updated 4 years ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆246Sep 24, 2023Updated 2 years ago
- An open-source active learning framework for training machine-learned interatomic potentials☆43Jun 9, 2026Updated 3 weeks ago
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆52Jul 23, 2024Updated last year
- A framework for exploring solutions to the Travelling Salesman Problem.☆16Apr 18, 2015Updated 11 years ago