Parallel Prefix Sum (Scan) with CUDA
☆29Jun 22, 2024Updated last year
Alternatives and similar repositories for parallel_prefix_sum
Users that are interested in parallel_prefix_sum are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- c++ 实现stanford cs149 assignment1☆14Feb 19, 2023Updated 3 years ago
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- ☆10May 12, 2022Updated 3 years ago
- ☆34Dec 19, 2025Updated 3 months ago
- Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning☆31Sep 12, 2025Updated 7 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- CUDA C simple application for Nvidia's GPU☆11Jun 7, 2022Updated 3 years ago
- 🐱 ncnn int8 模型量化评估☆14Oct 10, 2022Updated 3 years ago
- HTML/JS port of CUDA Occupancy Calculator☆17Nov 23, 2021Updated 4 years ago
- Minimal PyTorch implementation of TP, SP, FSDP and sharded-EMA☆32Nov 27, 2025Updated 4 months ago
- GEMM by WMMA (tensor core)☆15Jul 31, 2022Updated 3 years ago
- SGEMM optimization with cuda step by step☆22Mar 23, 2024Updated 2 years ago
- [ICLR 2026] Official PyTorch implementation for "ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding"☆61Dec 26, 2025Updated 3 months ago
- ☆17May 26, 2023Updated 2 years ago
- A code sample demonstrating how to share and rebuild a PyTorch GPU tensor via its pointer/reference between different processes.☆15Aug 27, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Rebuild YatSenOS On RISC-V 64.☆23Jan 6, 2022Updated 4 years ago
- This repo contains LaTeX template for experiment report.☆11Aug 17, 2021Updated 4 years ago
- LONGAGENT: Scaling Language Models to 128k Context through Multi-Agent Collaboration☆11Mar 11, 2024Updated 2 years ago
- [ICML 2025 Spotlight] RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding☆22Mar 2, 2025Updated last year
- My Assignment for CSE 599w http://dlsys.cs.washington.edu/☆15Dec 2, 2019Updated 6 years ago
- Homework of CMU 10-414/714: Deep Learning Systems (https://dlsyscourse.org/)☆15Mar 21, 2024Updated 2 years ago
- 实现一个基于eBPF技术监控容器行为的工具☆16May 9, 2025Updated 11 months ago
- Independent Multi-Modal Segmentation☆12Jun 12, 2025Updated 10 months ago
- Utility functions/scripts for working with GPUs.☆10Jul 5, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ACL 2026 & NAACL 2025: Bridging Retrieval and Inference through Evidence Fusion☆13Updated this week
- ☆10Sep 9, 2021Updated 4 years ago
- operate the xml files in the VOC dataset☆11Mar 23, 2019Updated 7 years ago
- ☆10Oct 8, 2022Updated 3 years ago
- TopViewRS: Vision-Language Models as Top-View Spatial Reasoners (EMNLP 2024 Oral)☆15Jun 14, 2025Updated 10 months ago
- Evaluate state-of-the-art sparse embedding models on the LIMIT dataset (`limit-small` and `limit`) from google's paper `On the Theoretica…☆16Sep 4, 2025Updated 7 months ago
- NVIDIA cuTile learn☆166Dec 9, 2025Updated 4 months ago
- [NSDI25] AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training☆31May 2, 2025Updated 11 months ago
- ☆20Mar 18, 2026Updated 3 weeks ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆14Nov 27, 2024Updated last year
- Just save my record on github...☆27Feb 7, 2021Updated 5 years ago
- openai-proxy-vercel☆12Aug 11, 2023Updated 2 years ago
- ☆25Sep 1, 2025Updated 7 months ago
- Nicol is an open-source web service, developed using the Kotlin programming language, that enables streaming Server Stream Events and s…☆12Dec 10, 2023Updated 2 years ago
- ☆84Apr 18, 2025Updated 11 months ago
- Personalized Fragrance Recommendation for Aromatherapy: A Machine Learning Approach Based on Personality Traits and Electrodermal Activit…☆10May 1, 2025Updated 11 months ago