An implementation of parallel exclusive scan in CUDA
☆67Feb 23, 2018Updated 8 years ago
Alternatives and similar repositories for cuda
Users that are interested in cuda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Parallel Prefix Sum (Scan) with CUDA.☆15Jul 17, 2020Updated 5 years ago
- Efficient CUDA Stream Compaction Library☆34Jun 9, 2023Updated 2 years ago
- A Haskell implementation of the Formality language☆18Mar 9, 2020Updated 6 years ago
- A simple, untyped, terminating functional language that is fully compatible with optimal reductions.☆18Jun 17, 2019Updated 6 years ago
- Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs☆16Feb 28, 2019Updated 7 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Aug 12, 2023Updated 2 years ago
- ☆14Dec 5, 2024Updated last year
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated 2 years ago
- Sample CMake template solving Ax=b☆15Mar 17, 2021Updated 5 years ago
- MIT-licensed stand-alone CUDA utility functions.☆16Jul 3, 2020Updated 5 years ago
- Calculus of Constructions☆18Jul 17, 2019Updated 6 years ago
- JAX/Flax implementation of the Hyena Hierarchy☆34Apr 27, 2023Updated 2 years ago
- Radix sort analyses in parallel and serial ways.☆10Jan 21, 2016Updated 10 years ago
- Official repository for the paper "Exploring the Promise and Limits of Real-Time Recurrent Learning" (ICLR 2024)☆13Jun 11, 2025Updated 10 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Source-to-Source Debuggable Derivatives in Pure Python☆15Jan 23, 2024Updated 2 years ago
- Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)☆16Jan 7, 2025Updated last year
- Triton Implementation of HyperAttention Algorithm☆48Dec 11, 2023Updated 2 years ago
- ☆13Apr 15, 2024Updated 2 years ago
- Multiplication on optimal λ-calculus reducers☆23Aug 14, 2020Updated 5 years ago
- Implementation and analysis of five different GPU based SPMV algorithms in CUDA☆40Feb 5, 2019Updated 7 years ago
- ☆33Mar 31, 2025Updated last year
- Implementation of lid driven cavity solver based on SIMPLE algorithm☆16Jan 11, 2019Updated 7 years ago
- Parallel Associative Scan for Language Models☆18Jan 8, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Blog post☆17Feb 16, 2024Updated 2 years ago
- ☆14Mar 10, 2020Updated 6 years ago
- ☆45Apr 30, 2018Updated 7 years ago
- Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.☆13Nov 3, 2023Updated 2 years ago
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19May 8, 2025Updated 11 months ago
- A new QR decomposition algorithm implemented in CUDA☆18Jun 24, 2024Updated last year
- Official Repository for Efficient Linear-Time Attention Transformers.☆18Jun 2, 2024Updated last year
- Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line☆25Apr 11, 2026Updated last week
- Parallel cuckoo hashing on GPUs with CUDA☆12Sep 27, 2019Updated 6 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Code for paper: End-to-end Stochastic Optimization with Energy-based Model☆16Feb 14, 2023Updated 3 years ago
- AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning (Published in TMLR)☆23Oct 15, 2024Updated last year
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Jun 5, 2024Updated last year
- Python implementation of paper "AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks"☆15Aug 2, 2019Updated 6 years ago
- ☆15Mar 15, 2022Updated 4 years ago
- A python algorithm to change the pitch of the voice in real time☆13Dec 13, 2020Updated 5 years ago
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆34Aug 6, 2023Updated 2 years ago