No-GIL Python environment featuring NVIDIA Deep Learning libraries.
☆70Apr 14, 2025Updated 11 months ago
Alternatives and similar repositories for free-threaded-python
Users that are interested in free-threaded-python are comparing it to the libraries listed below
Sorting:
- ☆13Jan 7, 2025Updated last year
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- [WIP] Better (FP8) attention for Hopper☆32Feb 24, 2025Updated last year
- ☆16Feb 26, 2026Updated 3 weeks ago
- study of cutlass☆22Nov 10, 2024Updated last year
- A simple sparse bitmap implementation in java☆22Jan 28, 2016Updated 10 years ago
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆19Nov 18, 2024Updated last year
- ☆11Feb 26, 2024Updated 2 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆94Nov 6, 2023Updated 2 years ago
- #UAI2020 Codes for PAC-Bayesian Contrastive Unsupervised Representation Learning☆14May 23, 2022Updated 3 years ago
- A survey of manufacturer-provided DRAM operating parameters and timings as specified by DRAM chip datasheets from between 1970 and 2021. …☆11May 4, 2022Updated 3 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆31Apr 2, 2025Updated 11 months ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆95Jan 16, 2026Updated 2 months ago
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Jan 11, 2025Updated last year
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Sep 4, 2025Updated 6 months ago
- A shell-friendly hyperparameter search tool inspired by Optuna☆18Dec 17, 2024Updated last year
- ☆22May 5, 2025Updated 10 months ago
- Tensor Basis Neural Network for Scalar Mixing☆10Mar 24, 2023Updated 2 years ago
- GVProf: A Value Profiler for GPU-based Clusters☆53Mar 24, 2024Updated last year
- Monitor parameter and gradient statistics during neural network training with Chainer☆13Jan 24, 2017Updated 9 years ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆13Nov 23, 2024Updated last year
- Reimplementation of the paper `Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? (ACL2…☆17Jul 10, 2020Updated 5 years ago
- ImageNet-12k subset of ImageNet-21k (fall11)☆21Jun 13, 2023Updated 2 years ago
- Automated GPU Kernel Generation via Co-Evolving Intrinsic World Model☆85Mar 2, 2026Updated 2 weeks ago
- A backend-dispatchable version of NumPy.☆19Feb 27, 2021Updated 5 years ago
- Process-based Asynchronous Progress Model for MPI Communication☆11Jan 24, 2021Updated 5 years ago
- PyTorch implementation of the Flash Spectral Transform Unit.☆22Sep 19, 2024Updated last year
- GPU Performance Advisor☆66Jul 25, 2022Updated 3 years ago
- ☆27Mar 14, 2024Updated 2 years ago
- [ICLR 2026] Official implementation of DiCache: Let Diffusion Model Determine Its Own Cache☆58Jan 26, 2026Updated last month
- A Python framework using OPM Flow for the SPE11 benchmark project☆18Feb 2, 2026Updated last month
- Open Source SSD Controller. NVMe and Lightstor variants☆17May 21, 2014Updated 11 years ago
- A practical way of learning Swizzle☆37Feb 3, 2025Updated last year
- Linux kernel to support Mellanox BlueField SoCs☆14Nov 13, 2019Updated 6 years ago
- Scalable GPU Kernel Fission/Fusion Transformation for Memory-Bound Kernels☆14Aug 26, 2015Updated 10 years ago
- ☆19Mar 22, 2024Updated last year
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆146Mar 10, 2026Updated last week
- 使用OpenCV部署CoupledTPS,包含了肖像矫正,不规则边界的图像矩形化,旋转图像矫正,三个模型。依然是包含C++和Python两个版本的程序☆20Jul 4, 2024Updated last year
- A napari plugin to load & deskew folders of lattice light sheet TIFFs☆13Jan 5, 2026Updated 2 months ago