Example of binding a TF32 CUTLASS GEMM kernel to PyTorch
☆12Jun 7, 2024Updated last year
Alternatives and similar repositories for tf32_gemm
Users that are interested in tf32_gemm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16Sep 24, 2024Updated last year
- Inference Llama 2 with a model compiled to native code by TorchInductor☆14Feb 8, 2024Updated 2 years ago
- ☆20May 24, 2025Updated 11 months ago
- Efficient-Tensor-Management-on-HM-for-Deep-Learning☆11Nov 15, 2021Updated 4 years ago
- [OSDI 2025] DecDEC: A Systems Approach to Advancing Low‑Bit LLM Quantization☆24Jan 29, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- This repository consists of useful tools or guides for system software development or anything interesting.☆11Feb 27, 2026Updated 2 months ago
- extensible collectives library in triton☆98Mar 31, 2025Updated last year
- A fast implementation of log() and exp()☆58Dec 14, 2022Updated 3 years ago
- An LLM-based system that fully automates Chaos Engineering (ASE 2025, NIER track)☆26Apr 6, 2026Updated 3 weeks ago
- ☆15Dec 29, 2022Updated 3 years ago
- Triton-based Symmetric Memory operators and examples☆98Mar 28, 2026Updated last month
- ☆13Dec 19, 2019Updated 6 years ago
- A Collection of GitHub Profiles with awesome readme☆14Aug 17, 2023Updated 2 years ago
- 在PyTorch上重构multi-agent deep deterministic policy gradient(MADDPG),将https://github.com/xuemei-ye/maddpg-mpe 修改到自己电脑上可运行 。因为本人笔记本没有CUDA,实验速度…☆14May 10, 2019Updated 6 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆14Mar 8, 2023Updated 3 years ago
- Transformer-based Long Document Classification☆17Nov 2, 2022Updated 3 years ago
- Tensors and Dynamic neural networks in Python with strong GPU acceleration☆15Dec 21, 2020Updated 5 years ago
- Useful low-level (“base”) routines from Chromium☆27Nov 30, 2015Updated 10 years ago
- ☆13Oct 13, 2021Updated 4 years ago
- Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)☆19May 28, 2024Updated last year
- ☆15Mar 30, 2024Updated 2 years ago
- Pairwise Controlled Manifold Approximation (PaCMAP) for dimensionality reduction☆20Feb 3, 2026Updated 2 months ago
- Disable YubiKey output on MacOS without a modifier key pressed☆10Aug 10, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Unofficial mirror of pdftk - imported using git-ubuntu☆10Aug 20, 2018Updated 7 years ago
- A compiler from Go to JavaScript for running Go code in a browser☆29Dec 6, 2017Updated 8 years ago
- A perl script for searching and replacing in mathematics in LaTeX documents.☆13Mar 31, 2026Updated last month
- ☆21Oct 31, 2025Updated 6 months ago
- Ongoing research training transformer models at scale☆18Apr 9, 2026Updated 3 weeks ago
- QuickJS for WASI☆31Jan 29, 2024Updated 2 years ago
- ☆12Dec 8, 2022Updated 3 years ago
- This repository holds the data and code for the AndroR2 dataset of manually-reproduced bug reports for Android apps☆25Jun 11, 2021Updated 4 years ago
- PyCUDA Gaussian Blur☆18Mar 24, 2019Updated 7 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A Python library for efficient feature ranking and selection on sparse data sets.☆23Mar 3, 2026Updated last month
- ☆18Dec 29, 2018Updated 7 years ago
- GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs☆16Apr 18, 2025Updated last year
- ☆16Feb 23, 2021Updated 5 years ago
- Analysis of the MovieLens dataset of movie ratings and reviews.☆11Sep 2, 2018Updated 7 years ago
- A Python Library for the 3GPP physical layer☆17Dec 18, 2025Updated 4 months ago
- Intelligent Resource Requirement Estimation and Scheduling for Deep Learning Jobs on Distributed GPU Clusters☆15Nov 18, 2021Updated 4 years ago