Parallel Prefix Sum (Scan) with CUDA
☆29Jun 22, 2024Updated last year
Alternatives and similar repositories for parallel_prefix_sum
Users that are interested in parallel_prefix_sum are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- c++ 实现stanford cs149 assignment1☆14Feb 19, 2023Updated 3 years ago
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- A distributed key value database based on LSM Tree storage☆15Aug 24, 2022Updated 3 years ago
- ☆10May 12, 2022Updated 3 years ago
- CUDA C simple application for Nvidia's GPU☆11Jun 7, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICML 2026] Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning☆32Sep 12, 2025Updated 7 months ago
- HTML/JS port of CUDA Occupancy Calculator☆17Nov 23, 2021Updated 4 years ago
- Minimal PyTorch implementation of TP, SP, FSDP and sharded-EMA☆32Nov 27, 2025Updated 5 months ago
- GEMM by WMMA (tensor core)☆15Jul 31, 2022Updated 3 years ago
- [ICLR 2026] Official PyTorch implementation for "ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding"☆63Dec 26, 2025Updated 4 months ago
- ☆31Aug 18, 2025Updated 8 months ago
- ☆15Jun 26, 2024Updated last year
- Memory experiments with LLMs☆10Mar 31, 2023Updated 3 years ago
- 📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉☆16Mar 30, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆34Oct 13, 2025Updated 6 months ago
- This repo contains LaTeX template for experiment report.☆11Aug 17, 2021Updated 4 years ago
- YOLOv8 C++ DET、SEG、POSE TENSORRT 推理库,便于学习开发拓展与工作中实际部署☆18Aug 22, 2023Updated 2 years ago
- Homework of CMU 10-414/714: Deep Learning Systems (https://dlsyscourse.org/)☆15Mar 21, 2024Updated 2 years ago
- 实现一个基于eBPF技术监控容器行为的工具☆16May 9, 2025Updated 11 months ago
- Democratizing AlphaFold3: an PyTorch reimplementation to accelerate protein structure prediction☆21May 24, 2025Updated 11 months ago
- Independent Multi-Modal Segmentation☆13Jun 12, 2025Updated 10 months ago
- Utility functions/scripts for working with GPUs.☆10Jul 5, 2021Updated 4 years ago
- 2D Deformable Convolution Network, Training and Testing on Cell Images☆12Jan 26, 2018Updated 8 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ACL 2026 & NAACL 2025: Bridging Retrieval and Inference through Evidence Fusion☆13Apr 9, 2026Updated 3 weeks ago
- A Winograd Minimal Filter Implementation in CUDA☆29Aug 25, 2021Updated 4 years ago
- operate the xml files in the VOC dataset☆11Mar 23, 2019Updated 7 years ago
- ☆10Oct 8, 2022Updated 3 years ago
- TopViewRS: Vision-Language Models as Top-View Spatial Reasoners (EMNLP 2024 Oral)☆15Jun 14, 2025Updated 10 months ago
- Evaluate state-of-the-art sparse embedding models on the LIMIT dataset (`limit-small` and `limit`) from google's paper `On the Theoretica…☆16Sep 4, 2025Updated 8 months ago
- CUDA project for uni subject☆26Oct 26, 2020Updated 5 years ago
- [NSDI25] AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training☆31May 2, 2025Updated last year
- ☆20Mar 18, 2026Updated last month
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆14Nov 27, 2024Updated last year
- ☆25Sep 1, 2025Updated 8 months ago
- LaLaRAND: Flexible Layer-by-Layer CPU/GPU Scheduling for Real-Time DNN Tasks☆18Mar 25, 2022Updated 4 years ago
- ☆25May 7, 2021Updated 4 years ago
- 文本数据挖掘大作业,分别用朴素贝叶斯,SVM,情感词典,LSTM,textcnn实现情感分析☆16Jun 16, 2023Updated 2 years ago
- 哈工大2022春数据库系统(HIT-DBMS)资料(实验/作业/期末复习资料)☆13May 26, 2022Updated 3 years ago
- Nicol is an open-source web service, developed using the Kotlin programming language, that enables streaming Server Stream Events and s…☆12Dec 10, 2023Updated 2 years ago