☆19Oct 3, 2022Updated 3 years ago
Alternatives and similar repositories for cuda-tensorcores-register-mapping
Users that are interested in cuda-tensorcores-register-mapping are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Some "Formula Translations" for Yousef Saad's book "Iterative Methods for Sparse Linear Systems (2nd Edition)"☆13Jan 14, 2018Updated 8 years ago
- ☆14Jul 16, 2020Updated 5 years ago
- CUDA GPU implementation of GMRES iterative Solver☆10Apr 16, 2012Updated 14 years ago
- ☆12Jan 19, 2020Updated 6 years ago
- ☆13Jan 18, 2020Updated 6 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆12Jan 13, 2023Updated 3 years ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆42Jul 24, 2024Updated last year
- KITTI Point Cloud Utilities☆12Jul 25, 2024Updated last year
- OpenFOAM right wmake at the right time☆11Mar 10, 2019Updated 7 years ago
- redux es5☆12May 12, 2016Updated 10 years ago
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Feb 13, 2023Updated 3 years ago
- Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'☆38Dec 4, 2021Updated 4 years ago
- 2D/ 3D object detection, segmentation, depth estimation for self-driving car☆10Feb 18, 2021Updated 5 years ago
- PyTorch Implementation of Thermal Image Enhancement Network☆17Dec 19, 2018Updated 7 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- arkit demo☆11Aug 20, 2018Updated 7 years ago
- ☆10Sep 13, 2021Updated 4 years ago
- Massively Scalable Parallel GMRES C-code for Sparse System of Equations☆13Feb 16, 2016Updated 10 years ago
- TensorRT5 Execution Sample from Python API☆12Nov 10, 2018Updated 7 years ago
- ☆21Aug 21, 2023Updated 2 years ago
- Matlab implementations of communication-avoiding Krylov subspace methods☆12Sep 2, 2021Updated 4 years ago
- `junior must know his place` team solution☆10Aug 15, 2023Updated 2 years ago
- A new QR decomposition algorithm implemented in CUDA☆18Jun 24, 2024Updated last year
- An application for storing your notes for your tabletop RPG campaigns!☆21Jan 25, 2026Updated 4 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Implementing a parallelized conjugate gradient algorithm using a hybrid of distributed (MPI) and shared (OpenMP) memory approach.☆11Dec 8, 2018Updated 7 years ago
- ☆11Apr 14, 2022Updated 4 years ago
- Draw in the world around you with OpenGL, and OpenCV☆13May 7, 2014Updated 12 years ago
- ☆16Nov 22, 2022Updated 3 years ago
- Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021☆30Sep 25, 2021Updated 4 years ago
- Catalyst.Detection☆12Sep 13, 2021Updated 4 years ago
- Implementation of the Remixer Block from the Remixer paper, in Pytorch☆36Sep 27, 2021Updated 4 years ago
- A highly efficient library for GEMM operations on Sunway TaihuLight☆18Sep 7, 2020Updated 5 years ago
- Guide on how to convert custom PyTorch layers when using ONNX.☆22Sep 4, 2018Updated 7 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Authors implementation of LieTransformer: Equivariant Self-Attention for Lie Groups☆36Feb 5, 2021Updated 5 years ago
- Simple example of how to write an Implicit GEMM Convolution in CUDA using the tensor core WMMA API and bindings for PyTorch.☆18Jun 29, 2023Updated 2 years ago
- A course on programming in open-source FVM code OpenFOAM☆18Jan 5, 2016Updated 10 years ago
- ☆15Jan 11, 2023Updated 3 years ago
- Anchor Assignment and Sampling Heuristics in Deep Object Detection: A Review☆11Aug 2, 2022Updated 3 years ago
- Matrix-Vector Multiplication Using Shared and Coalesced Memory Access☆16Apr 9, 2013Updated 13 years ago
- code for paper -- "Seamless Satellite-image Synthesis"☆17Jul 30, 2024Updated last year