hku-systems / naspipeView external linksLinks
☆14Jan 12, 2022Updated 4 years ago
Alternatives and similar repositories for naspipe
Users that are interested in naspipe are comparing it to the libraries listed below
Sorting:
- PPoPP24 AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping☆22May 8, 2024Updated last year
- HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units (KDD 2020)☆12Jan 25, 2021Updated 5 years ago
- A High-Throughput Multi-GPU System for Graph-Based Approximate Nearest Neighbor Search☆21Jul 22, 2025Updated 6 months ago
- [DATE 2023] Pipe-BD: Pipelined Parallel Blockwise Distillation☆12Jul 13, 2023Updated 2 years ago
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆35Jan 9, 2023Updated 3 years ago
- ☆37Oct 11, 2025Updated 4 months ago
- Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)☆19May 28, 2024Updated last year
- Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Mult…☆41Mar 17, 2024Updated last year
- A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.☆43May 29, 2022Updated 3 years ago
- Johnny Cache: the End of DRAM Cache Conflicts (in Tiered Main Memory Systems)☆20Aug 2, 2023Updated 2 years ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆91Nov 23, 2022Updated 3 years ago
- Source code for the paper: "A Latency-Predictable Multi-Dimensional Optimization Framework forDNN-driven Autonomous Systems"☆22Jan 4, 2021Updated 5 years ago
- Surrogate-based Hyperparameter Tuning System☆28Jun 29, 2023Updated 2 years ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆25May 12, 2025Updated 9 months ago
- ☆26Aug 19, 2022Updated 3 years ago
- Gensis is a lightweight deep learning framework written from scratch in Python, with Triton as its backend for high-performance computing…☆37Jan 15, 2026Updated 3 weeks ago
- ☆25Apr 3, 2023Updated 2 years ago
- DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.☆58Aug 21, 2024Updated last year
- Asynchronous pipeline parallel optimization☆19Feb 2, 2026Updated last week
- Artifact evaluation of PLDI'24 paper "Allo: A Programming Model for Composable Accelerator Design"☆33Apr 11, 2024Updated last year
- a deep learning-driven scheduler for elastic training in deep learning clusters☆31Jan 14, 2021Updated 5 years ago
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels☆32Mar 15, 2021Updated 4 years ago
- Anomaly Detection in Dynamic Graphs☆32Nov 1, 2023Updated 2 years ago
- ☆33Sep 9, 2020Updated 5 years ago
- ☆13Jan 28, 2026Updated 2 weeks ago
- Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion☆32May 15, 2024Updated last year
- This serves as a repository for reproducibility of the SC21 paper "In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated…☆39Sep 25, 2023Updated 2 years ago
- ☆37Jan 20, 2022Updated 4 years ago
- ☆42Jun 13, 2025Updated 8 months ago
- ANT-ACE: Advanced Compiler Ecosystem for Fully Homomorphic Encryption and Domain Specific Computing☆54Jan 15, 2026Updated 3 weeks ago
- A simple MIPS CPU for BUAA CO course (and now NSCSCC).☆10May 15, 2021Updated 4 years ago
- A distributed stream querying engine that provides sub-millisecond stateful query at millions of queries per-second over fast-evolving li…☆10Jul 18, 2018Updated 7 years ago
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- ☆10Mar 8, 2025Updated 11 months ago
- ☆10Jun 18, 2024Updated last year
- Canopy is a machine learning learning compiler stack with the capability of adopting high-end FPGAs. As a part of OpenAIOS project, Canop…☆12May 7, 2021Updated 4 years ago
- A simple script to plot the Roofline model for given HW platforms and applications☆10Aug 22, 2024Updated last year
- DELTA-pytorch:DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation☆12Apr 16, 2024Updated last year
- This is the code of a agentic rag method with dynamic workflow.☆13Jan 22, 2026Updated 3 weeks ago