The (open-source part of) code to reproduce "BPPSA: Scaling Back-propagation by Parallel Scan Algorithm".
☆13Jun 7, 2021Updated 4 years ago
Alternatives and similar repositories for BPPSA-open
Users that are interested in BPPSA-open are comparing it to the libraries listed below
Sorting:
- Switch-based Training Acceleration for Machine Learning (SwitchML)☆16Apr 13, 2021Updated 4 years ago
- ☆14Nov 7, 2025Updated 3 months ago
- A PyTorch wrapper of parallel exclusive scan in CUDA☆12May 25, 2023Updated 2 years ago
- ☆24May 6, 2022Updated 3 years ago
- ☆27Mar 2, 2023Updated 3 years ago
- This repository corresponds to the PICCO compiler for secure multi-party computation published in 2013 with more recent efficiency improv…☆12Feb 24, 2026Updated last week
- Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…☆27Dec 10, 2022Updated 3 years ago
- Cocytus is an efficient and available in-memory K/V-store through hybrid erasure coding and replication☆31Mar 7, 2016Updated 9 years ago
- Code released to accompany the ISCA paper: "T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware"☆28Feb 18, 2022Updated 4 years ago
- A rust-based benchmark for BlueField SmartNICs.☆30Jul 5, 2023Updated 2 years ago
- Prefix-Aware Attention for LLM Decoding☆29Jan 23, 2026Updated last month
- Johann, the lightweight and flexible scenario orchestrator☆12Oct 3, 2022Updated 3 years ago
- Code accompanying the NeurIPS 2019 paper AutoAssist: A Framework to Accelerate Training of Deep Neural Networks.☆14Oct 3, 2022Updated 3 years ago
- 供大学生,竞赛生,高中生查找的math-wiki☆10May 26, 2022Updated 3 years ago
- netbeacon - monitoring your network capture, NIDS or network analysis process☆19Oct 26, 2013Updated 12 years ago
- Community maintained hardware plugin for vLLM on AWS Neuron☆23Feb 26, 2026Updated last week
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- LITS: An Optimized Learned Index for Strings☆13Jun 18, 2025Updated 8 months ago
- ☆44Nov 15, 2021Updated 4 years ago
- Homebrew formulas for installing LLM and related tools☆15Sep 6, 2023Updated 2 years ago
- sgx-based encrypted deduplication prototype☆14May 14, 2021Updated 4 years ago
- ☆13Jan 21, 2022Updated 4 years ago
- NVMesh Container Storage Interface (CSI) Driver for Kubernetes☆11Oct 7, 2024Updated last year
- A Coq framework to support structural design and proof of hardware cache-coherence protocols☆14May 7, 2022Updated 3 years ago
- ☆15Jul 18, 2023Updated 2 years ago
- ☆11Sep 22, 2017Updated 8 years ago
- Proposal for the next generation of course-oriented IR.☆10Dec 24, 2021Updated 4 years ago
- ☆10Jun 28, 2025Updated 8 months ago
- Jieba 0.39 的 Java 复刻版,支持原版 Jieba 的所有核心功能☆12Feb 14, 2019Updated 7 years ago
- ☆11Apr 3, 2023Updated 2 years ago
- ☆10May 16, 2021Updated 4 years ago
- ☆11Oct 11, 2023Updated 2 years ago
- ☆10Jun 4, 2021Updated 4 years ago
- A tool for cross-checking Verilog compilers☆14Apr 16, 2025Updated 10 months ago
- Disco Stochastic Network Calculator☆10Aug 15, 2017Updated 8 years ago
- Beep the PC speaker☆11Nov 9, 2022Updated 3 years ago
- For our ISSTA'23 paper ACETest: Automated Constraint Extraction for Testing Deep Learning Operators☆13Mar 30, 2024Updated last year
- Homework solutions to 2017 Fall Algorithm Courses in ShanghaiTech☆10Jan 5, 2018Updated 8 years ago
- 金沢人工知能勉強会・交流会で使用した資料置き場です。発表に使用したスライドやPythonのプログラムなど☆12Oct 11, 2020Updated 5 years ago