No-GIL Python environment featuring NVIDIA Deep Learning libraries.
☆70Apr 14, 2025Updated 10 months ago
Alternatives and similar repositories for free-threaded-python
Users that are interested in free-threaded-python are comparing it to the libraries listed below
Sorting:
- ☆13Jan 7, 2025Updated last year
- ☆16Feb 6, 2026Updated 3 weeks ago
- [WIP] Better (FP8) attention for Hopper☆32Feb 24, 2025Updated last year
- A survey of manufacturer-provided DRAM operating parameters and timings as specified by DRAM chip datasheets from between 1970 and 2021. …☆11May 4, 2022Updated 3 years ago
- A simple sparse bitmap implementation in java☆22Jan 28, 2016Updated 10 years ago
- ☆11Feb 26, 2024Updated 2 years ago
- ☆12Apr 7, 2025Updated 10 months ago
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆18Nov 18, 2024Updated last year
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated 10 months ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆93Jan 16, 2026Updated last month
- Scalable GPU Kernel Fission/Fusion Transformation for Memory-Bound Kernels☆14Aug 26, 2015Updated 10 years ago
- E-book for AIS1003☆18Oct 27, 2023Updated 2 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆94Nov 6, 2023Updated 2 years ago
- GPU Performance Advisor☆66Jul 25, 2022Updated 3 years ago
- Reimplementation of the paper `Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? (ACL2…☆17Jul 10, 2020Updated 5 years ago
- Python interface to the QDLDL (https://github.com/osqp/qdldl) free LDL factorization routine for quasi-definite linear systems☆16Feb 19, 2026Updated last week
- PyTorch implementation of the Flash Spectral Transform Unit.☆21Sep 19, 2024Updated last year
- Open Source SSD Controller. NVMe and Lightstor variants☆18May 21, 2014Updated 11 years ago
- Monitor parameter and gradient statistics during neural network training with Chainer☆13Jan 24, 2017Updated 9 years ago
- A shell-friendly hyperparameter search tool inspired by Optuna☆18Dec 17, 2024Updated last year
- NVIDIA DPU OPs collection☆15Mar 6, 2023Updated 2 years ago
- Scale Optuna with Dask☆36Oct 1, 2020Updated 5 years ago
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 9 months ago
- ☆22May 5, 2025Updated 9 months ago
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Jan 11, 2025Updated last year
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆144Updated this week
- ☆43Jan 24, 2026Updated last month
- PyCes (Python Code Scanner) - Enhanced Security Static Analysis Tool for Python☆11Apr 18, 2019Updated 6 years ago
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆57Nov 20, 2024Updated last year
- FluidNet re-written with ATen tensor lib☆52Jun 17, 2019Updated 6 years ago
- Automatic virtualization of (general) accelerators.☆46Nov 28, 2022Updated 3 years ago
- study of Ampere' Sparse Matmul☆18Jan 10, 2021Updated 5 years ago
- ☆39Dec 14, 2025Updated 2 months ago
- image to column☆30Jul 15, 2014Updated 11 years ago
- 5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs☆57Nov 19, 2025Updated 3 months ago
- Heterogeneous Accelerated Computed Cluster (HACC) Resources Page☆22Oct 7, 2025Updated 4 months ago
- ☆20Dec 24, 2024Updated last year
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆86Feb 11, 2026Updated 2 weeks ago
- ☆47Aug 15, 2019Updated 6 years ago