A high-performance attention mechanism that computes softmax normalization in a single streaming pass using running accumulators (online softmax).
☆29Oct 11, 2025Updated 6 months ago
Alternatives and similar repositories for StreamAttn
Users that are interested in StreamAttn are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Learning about CUDA by writing PTX code.☆159Feb 27, 2024Updated 2 years ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 2 months ago
- 2026年排名前5最好用的VPN(梯子、机场)推荐与免费代理工具分析,专为中国用户优化,兼具 极速连接、顶级安全与高性价比。全球节点加速,多节点随意切换,让你轻松解锁 ChatGPT、Google、YouTube、Netflix、TikTok 等受限服务;支持 Androi…☆30Apr 27, 2026Updated last week
- Because it's there.☆16Sep 22, 2024Updated last year
- Setting up Vscode to work with Pytorch in C/C++ with CUDA support☆25Feb 5, 2025Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- A rust wrapper for HIP☆12Jun 10, 2025Updated 10 months ago
- Bindings for o1js to lower layers of the proof system and the Mina transaction logic☆13Apr 25, 2025Updated last year
- ☆12Mar 31, 2023Updated 3 years ago
- ☆24Apr 7, 2026Updated 3 weeks ago
- Compare Bloxroute and Fiber transaction streams☆10Nov 22, 2024Updated last year
- CDLS: Proving Knowledge of Committed Discrete Logarithms with Soundness☆11Updated this week
- The entry point for Rust projects to be run on Valida☆10Mar 14, 2025Updated last year
- Virtual wearable☆15Mar 14, 2023Updated 3 years ago
- ☆28Jun 22, 2025Updated 10 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆12Jun 5, 2025Updated 10 months ago
- 🦎 Prototypes on polymorphic, metamorphic and poly-metamorphic malwares in Rust 🦎☆14Oct 8, 2023Updated 2 years ago
- ☆12Oct 4, 2023Updated 2 years ago
- ☆12Sep 11, 2024Updated last year
- A collection of GPU experiments and benchmarks for my personal understanding and research.☆30Apr 9, 2026Updated 3 weeks ago
- Decentralised Privacy-Preserving Contact Discovery☆18Jul 4, 2023Updated 2 years ago
- ☆12Feb 18, 2025Updated last year
- Seamlessly merge multi-track audio into a single unified transcript - perfect with Craig.chat☆33Oct 31, 2025Updated 6 months ago
- ☆20Apr 24, 2026Updated last week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A fast 2-level, sparse bloom filter implementation☆16Jan 13, 2026Updated 3 months ago
- A Bitcoin-based access protocol for encrypted secrets — verifiable access through proof-of-work, not permission.☆25Jun 2, 2025Updated 11 months ago
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆21Jan 24, 2025Updated last year
- Tasking 2.0☆17Nov 1, 2021Updated 4 years ago
- Lego for GRPO☆30May 27, 2025Updated 11 months ago
- Code for Robust Fine-tuning (RbFT)☆17Jan 31, 2025Updated last year
- An implementation of a verifiable oblivious pseudorandom function (RFC 9497)☆82Apr 6, 2026Updated 3 weeks ago
- Multi Stopwatch for Python☆12Sep 28, 2019Updated 6 years ago
- ☆22May 5, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- WhatsApp statistics toolkit mirror☆10Mar 24, 2019Updated 7 years ago
- Verified implementations for the Noise family of protocols☆17Jun 18, 2024Updated last year
- ☆17Apr 14, 2022Updated 4 years ago
- ☆13Mar 11, 2018Updated 8 years ago
- Personal solutions to the Triton Puzzles☆21Jul 18, 2024Updated last year
- A tag editor written in C# and WPF☆12Aug 20, 2023Updated 2 years ago
- ☆19Mar 3, 2025Updated last year