Example of binding a TF32 CUTLASS GEMM kernel to PyTorch
☆12Jun 7, 2024Updated last year
Alternatives and similar repositories for tf32_gemm
Users that are interested in tf32_gemm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16Sep 24, 2024Updated last year
- Inference Llama 2 with a model compiled to native code by TorchInductor☆14Feb 8, 2024Updated 2 years ago
- ☆20May 24, 2025Updated 9 months ago
- Efficient-Tensor-Management-on-HM-for-Deep-Learning☆10Nov 15, 2021Updated 4 years ago
- This repository consists of useful tools or guides for system software development or anything interesting.☆11Feb 27, 2026Updated 3 weeks ago
- A fast implementation of log() and exp()☆57Dec 14, 2022Updated 3 years ago
- extensible collectives library in triton☆97Mar 31, 2025Updated 11 months ago
- An LLM-based system that fully automates Chaos Engineering (ASE 2025, NIER track)☆25Jan 16, 2026Updated 2 months ago
- ☆15Dec 29, 2022Updated 3 years ago
- Triton-based Symmetric Memory operators and examples☆91Jan 15, 2026Updated 2 months ago
- ☆13Dec 19, 2019Updated 6 years ago
- A Collection of GitHub Profiles with awesome readme☆14Aug 17, 2023Updated 2 years ago
- 在PyTorch上重构multi-agent deep deterministic policy gradient(MADDPG),将https://github.com/xuemei-ye/maddpg-mpe 修改到自己电脑上可运行。因为本人笔记本没有CUDA,实验速度…☆14May 10, 2019Updated 6 years ago
- ☆14Mar 8, 2023Updated 3 years ago
- Transformer-based Long Document Classification☆17Nov 2, 2022Updated 3 years ago
- Tensors and Dynamic neural networks in Python with strong GPU acceleration☆15Dec 21, 2020Updated 5 years ago
- Useful low-level (“base”) routines from Chromium☆27Nov 30, 2015Updated 10 years ago
- ☆13Oct 13, 2021Updated 4 years ago
- ☆15Mar 30, 2024Updated last year
- Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)☆19May 28, 2024Updated last year
- Pairwise Controlled Manifold Approximation (PaCMAP) for dimensionality reduction☆20Feb 3, 2026Updated last month
- Unofficial mirror of pdftk - imported using git-ubuntu☆10Aug 20, 2018Updated 7 years ago
- Disable YubiKey output on MacOS without a modifier key pressed☆10Aug 10, 2022Updated 3 years ago
- A compiler from Go to JavaScript for running Go code in a browser☆29Dec 6, 2017Updated 8 years ago
- A perl script for searching and replacing in mathematics in LaTeX documents.☆13Jul 21, 2021Updated 4 years ago
- Ongoing research training transformer models at scale☆18Updated this week
- ☆12Dec 8, 2022Updated 3 years ago
- ☆20Oct 31, 2025Updated 4 months ago
- QuickJS for WASI☆31Jan 29, 2024Updated 2 years ago
- A Python Library for the 3GPP physical layer☆15Dec 18, 2025Updated 3 months ago
- This repository holds the data and code for the AndroR2 dataset of manually-reproduced bug reports for Android apps☆25Jun 11, 2021Updated 4 years ago
- PyCUDA Gaussian Blur☆18Mar 24, 2019Updated 6 years ago
- ☆18Dec 29, 2018Updated 7 years ago
- A Python library for efficient feature ranking and selection on sparse data sets.☆23Mar 3, 2026Updated 2 weeks ago
- ☆16Feb 23, 2021Updated 5 years ago
- GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs☆16Apr 18, 2025Updated 11 months ago
- Analysis of the MovieLens dataset of movie ratings and reviews.☆11Sep 2, 2018Updated 7 years ago
- Intelligent Resource Requirement Estimation and Scheduling for Deep Learning Jobs on Distributed GPU Clusters☆15Nov 18, 2021Updated 4 years ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆19Aug 3, 2025Updated 7 months ago