Code release for book "Efficient Training in PyTorch"
☆125Apr 10, 2025Updated 10 months ago
Alternatives and similar repositories for EfficientPyTorch
Users that are interested in EfficientPyTorch are comparing it to the libraries listed below
Sorting:
- ☆152Jan 9, 2025Updated last year
- modified cutlass☆15Oct 26, 2020Updated 5 years ago
- Interactive surface flow toy implemented in Taichi☆66Oct 22, 2022Updated 3 years ago
- A unified programming framework for high and portable performance across FPGAs and GPUs☆11Mar 23, 2025Updated 11 months ago
- A demo illustrating how to use Taichi as an AOT shader compiler☆76Apr 7, 2025Updated 10 months ago
- Soft2D: A 2D multi-material continuum physics engine designed for real-time applications.☆52Aug 21, 2023Updated 2 years ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆93Jan 16, 2026Updated last month
- Utilities for paper writing.☆12Jan 11, 2026Updated last month
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆57Jul 23, 2024Updated last year
- ☆13Nov 1, 2021Updated 4 years ago
- My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.☆32Dec 5, 2025Updated 2 months ago
- a high performance system for customized-precision distributed deep learning☆12Dec 10, 2020Updated 5 years ago
- ☆87Updated this week
- ☆14May 28, 2019Updated 6 years ago
- HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration☆15Sep 14, 2020Updated 5 years ago
- ☆17May 18, 2022Updated 3 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆35Jul 28, 2020Updated 5 years ago
- InfiniBand Diagnostic Tools (DEPRECATED, part of rdma-core)☆18May 12, 2019Updated 6 years ago
- Hi SPH in taichi!☆17Feb 6, 2023Updated 3 years ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Jul 4, 2025Updated 7 months ago
- ☆123Feb 24, 2026Updated last week
- ☆38Aug 7, 2025Updated 6 months ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated last year
- TaichiCon: Taichi Conferences☆72Mar 18, 2022Updated 3 years ago
- Using PyTorch autograd to compute Hessian of Perplexity for Large Language Models☆27Apr 17, 2025Updated 10 months ago
- A Taichi implementation of WCSPH☆16Dec 3, 2021Updated 4 years ago
- Taichi Implementation of "The Power Particle-in-Cell Method"☆21Aug 21, 2022Updated 3 years ago
- ☆20Sep 28, 2024Updated last year
- ICML2017 MEC: Memory-efficient Convolution for Deep Neural Network C++实现(非官方)☆17Apr 9, 2019Updated 6 years ago
- Triton to TVM transpiler.☆23Oct 14, 2024Updated last year
- GPU Affinity is a package to automatically set the CPU process affinity to match the hardware architecture on a given platform☆29Dec 8, 2023Updated 2 years ago
- ☆20Dec 29, 2023Updated 2 years ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆264Updated this week
- End to End steps for adding custom ops in PyTorch.☆24Aug 20, 2020Updated 5 years ago
- ☆88May 31, 2025Updated 9 months ago
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆51Jul 23, 2024Updated last year
- ☆22Apr 21, 2023Updated 2 years ago
- Puzzles for learning Triton, play it with minimal environment configuration!☆634Dec 28, 2025Updated 2 months ago
- ☆12Sep 30, 2018Updated 7 years ago