DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
☆92Jan 26, 2026Updated last month
Alternatives and similar repositories for DLSlime
Users that are interested in DLSlime are comparing it to the libraries listed below
Sorting:
- [DAC2024] A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning☆15Jan 13, 2024Updated 2 years ago
- [Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…☆13Jan 16, 2026Updated last month
- ☆13May 23, 2025Updated 9 months ago
- High-performance distributed data shuffling (all-to-all) library for MoE training and inference☆112Dec 31, 2025Updated 2 months ago
- High Performance KV Cache Store for LLM☆46Feb 7, 2026Updated 3 weeks ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆163Feb 11, 2026Updated 2 weeks ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆91Updated this week
- a simple API to use CUPTI☆11Aug 19, 2025Updated 6 months ago
- ☆87Updated this week
- A lightweight design for computation-communication overlap.☆221Jan 20, 2026Updated last month
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Jul 4, 2025Updated 7 months ago
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- Perplexity open source garden for inference technology☆367Dec 25, 2025Updated 2 months ago
- Debug print operator for cudagraph debugging☆14Aug 2, 2024Updated last year
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Nov 23, 2024Updated last year
- AI model training on heterogeneous, geo-distributed resources☆37Nov 24, 2025Updated 3 months ago
- DLBlas: clean and efficient kernels☆33Updated this week
- ☆74Oct 31, 2024Updated last year
- ☆23Feb 13, 2026Updated 2 weeks ago
- ☆84Jan 22, 2026Updated last month
- ☆38Aug 7, 2025Updated 6 months ago
- ☆65Apr 26, 2025Updated 10 months ago
- DeepSeek-V3/R1 inference performance simulator☆177Mar 27, 2025Updated 11 months ago
- GPU Affinity is a package to automatically set the CPU process affinity to match the hardware architecture on a given platform☆29Dec 8, 2023Updated 2 years ago
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- ☆74Feb 11, 2026Updated 2 weeks ago
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆466Dec 31, 2025Updated last month
- Surrogate-based Hyperparameter Tuning System☆28Jun 29, 2023Updated 2 years ago
- libsmctrl论文的复现,添加了python端接口,可以在python端灵活调用接口来分配计算资源☆12May 21, 2024Updated last year
- A docker image for One Student One Chip's debug exam☆10Sep 22, 2023Updated 2 years ago
- FlagCX is a scalable and adaptive cross-chip communication library.☆174Updated this week
- ☆76Nov 22, 2024Updated last year
- Gensis is a lightweight deep learning framework written from scratch in Python, with Triton as its backend for high-performance computing…☆37Jan 15, 2026Updated last month
- ☆11Apr 5, 2021Updated 4 years ago
- ☆13Nov 21, 2024Updated last year
- ☆25Oct 11, 2025Updated 4 months ago
- NVIDIA Networking NIC Configuration Operator For Kubernetes☆14Updated this week
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆271Feb 2, 2026Updated 3 weeks ago
- Nex Venus Communication Library☆72Nov 17, 2025Updated 3 months ago