☆19Oct 3, 2022Updated 3 years ago
Alternatives and similar repositories for cuda-tensorcores-register-mapping
Users that are interested in cuda-tensorcores-register-mapping are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Some "Formula Translations" for Yousef Saad's book "Iterative Methods for Sparse Linear Systems (2nd Edition)"☆13Jan 14, 2018Updated 8 years ago
- ☆14Jul 16, 2020Updated 5 years ago
- ☆30Oct 3, 2022Updated 3 years ago
- CUDA GPU implementation of GMRES iterative Solver☆10Apr 16, 2012Updated 14 years ago
- A Redis client for multi-threaded servers.☆47Apr 15, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆23Oct 24, 2022Updated 3 years ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆42Jul 24, 2024Updated last year
- OpenFOAM right wmake at the right time☆11Mar 10, 2019Updated 7 years ago
- This is a high performance stub server.☆14Sep 3, 2024Updated last year
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Feb 13, 2023Updated 3 years ago
- Working examples in the Vale programming language☆14Mar 21, 2022Updated 4 years ago
- Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'☆38Dec 4, 2021Updated 4 years ago
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- Massively Scalable Parallel GMRES C-code for Sparse System of Equations☆13Feb 16, 2016Updated 10 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- A wide array of parallel programs using CUDA, OpenCL, MPI, OpenMP and pthreads.☆14Jan 6, 2015Updated 11 years ago
- High-performance GEMM implementation optimized for NVIDIA H100 GPUs, leveraging Hopper architecture's TMA, WGMMA, and Thread Block Cluste…☆10Dec 4, 2024Updated last year
- ☆15Jan 27, 2011Updated 15 years ago
- `junior must know his place` team solution☆10Aug 15, 2023Updated 2 years ago
- llama.cpp inspired AI vibe coded support for LLMs in Nim.☆29May 16, 2026Updated last week
- An application for storing your notes for your tabletop RPG campaigns!☆21Jan 25, 2026Updated 4 months ago
- ☆11Apr 14, 2022Updated 4 years ago
- ☆16Nov 22, 2022Updated 3 years ago
- Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021☆30Sep 25, 2021Updated 4 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- 首届中国心电智能大赛决赛阶段解决方案-公开版 比赛网址 http://mdi.ids.tsinghua.edu.cn/☆10Aug 21, 2019Updated 6 years ago
- Catalyst.Detection☆12Sep 13, 2021Updated 4 years ago
- Implementation of the Remixer Block from the Remixer paper, in Pytorch☆36Sep 27, 2021Updated 4 years ago
- A highly efficient library for GEMM operations on Sunway TaihuLight☆18Sep 7, 2020Updated 5 years ago
- Authors implementation of LieTransformer: Equivariant Self-Attention for Lie Groups☆36Feb 5, 2021Updated 5 years ago
- Simple example of how to write an Implicit GEMM Convolution in CUDA using the tensor core WMMA API and bindings for PyTorch.☆18Jun 29, 2023Updated 2 years ago
- xxhash wrapper for Nim☆19Mar 20, 2025Updated last year
- A simple implementation of a GPT-style Transformer architecture and inference.☆16Jan 26, 2024Updated 2 years ago
- Anchor Assignment and Sampling Heuristics in Deep Object Detection: A Review☆11Aug 2, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A python library for highly configurable transformers - easing model architecture search and experimentation.☆48Nov 30, 2021Updated 4 years ago
- computing the non-convex risk parity porfolio problems by the non-convex quadratic approxiamtion (NCQA), interior point method (IPM) and…☆26Oct 23, 2022Updated 3 years ago
- ☆81Jan 21, 2022Updated 4 years ago
- A simple, flexible, zero-dependency modal stack manager for React.☆14Jan 6, 2023Updated 3 years ago
- Implementation of Flash Attention in Jax☆228Mar 1, 2024Updated 2 years ago
- High performance pytorch modules☆18Jan 14, 2023Updated 3 years ago
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year