Parallel Prefix Sum (Scan) with CUDA
☆29Jun 22, 2024Updated last year
Alternatives and similar repositories for parallel_prefix_sum
Users that are interested in parallel_prefix_sum are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- c++ 实现stanford cs149 assignment1☆14Feb 19, 2023Updated 3 years ago
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- A distributed key value database based on LSM Tree storage☆15Aug 24, 2022Updated 3 years ago
- a simple API to use CUPTI☆10Aug 19, 2025Updated 9 months ago
- ☆32Dec 29, 2025Updated 4 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- CUDA C simple application for Nvidia's GPU☆11Jun 7, 2022Updated 3 years ago
- GEMM by WMMA (tensor core)☆15Jul 31, 2022Updated 3 years ago
- [ICLR 2026] Official PyTorch implementation for "ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding"☆63Dec 26, 2025Updated 5 months ago
- An Open-Source RAG Workload Trace to Optimize RAG Serving Systems☆36Nov 18, 2025Updated 6 months ago
- ☆31Aug 18, 2025Updated 9 months ago
- ☆28Oct 2, 2025Updated 7 months ago
- Memory experiments with LLMs☆10Mar 31, 2023Updated 3 years ago
- ☆19May 26, 2023Updated 3 years ago
- A code sample demonstrating how to share and rebuild a PyTorch GPU tensor via its pointer/reference between different processes.☆15Aug 27, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- LONGAGENT: Scaling Language Models to 128k Context through Multi-Agent Collaboration☆11Mar 11, 2024Updated 2 years ago
- My Assignment for CSE 599w http://dlsys.cs.washington.edu/☆15Dec 2, 2019Updated 6 years ago
- Homework of CMU 10-414/714: Deep Learning Systems (https://dlsyscourse.org/)☆15Mar 21, 2024Updated 2 years ago
- Democratizing AlphaFold3: an PyTorch reimplementation to accelerate protein structure prediction☆21May 24, 2025Updated last year
- Independent Multi-Modal Segmentation☆13Jun 12, 2025Updated 11 months ago
- Rebuild YatSenOS On RISC-V 64.☆23Jan 6, 2022Updated 4 years ago
- ACL 2026 & NAACL 2025: Bridging Retrieval and Inference through Evidence Fusion☆13Apr 9, 2026Updated last month
- A Winograd Minimal Filter Implementation in CUDA☆29Aug 25, 2021Updated 4 years ago
- operate the xml files in the VOC dataset☆11Mar 23, 2019Updated 7 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [NSDI25] AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training☆31May 2, 2025Updated last year
- CUDA project for uni subject☆26Oct 26, 2020Updated 5 years ago
- NVIDIA cuTile learn☆168Dec 9, 2025Updated 5 months ago
- ☆14Jun 9, 2021Updated 4 years ago
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆14Nov 27, 2024Updated last year
- ☆25Sep 1, 2025Updated 8 months ago
- ☆25May 7, 2021Updated 5 years ago
- Nicol is an open-source web service, developed using the Kotlin programming language, that enables streaming Server Stream Events and s…☆12Dec 10, 2023Updated 2 years ago
- 文本数据挖掘大作业,分别用朴素贝叶斯,SVM,情感词典,LSTM,textcnn实现情感分析☆16Jun 16, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 哈工大2022春数据库系统(HIT-DBMS)资料(实验/作业/期末复习资料)☆13May 26, 2022Updated 4 years ago
- Personalized Fragrance Recommendation for Aromatherapy: A Machine Learning Approach Based on Personality Traits and Electrodermal Activit…☆10May 1, 2025Updated last year
- ☆85Apr 18, 2025Updated last year
- ☆12May 18, 2020Updated 6 years ago
- Qt C++ with OpenCV☆16Jan 2, 2016Updated 10 years ago
- ☆39May 20, 2025Updated last year
- Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Mult…☆40Mar 17, 2024Updated 2 years ago