High Performance Int8 GEMM Kernels for SM80 and later GPUs.
☆23Mar 11, 2025Updated last year
Alternatives and similar repositories for gemm-int8
Users that are interested in gemm-int8 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆21Jan 24, 2025Updated last year
- PyTorch Quantization Framework For OCP MX Datatypes.☆16May 30, 2025Updated last year
- ADAPTIVE RESONANCE THEORY. Gail A. Carpenter and Stephen Grossberg☆10Feb 10, 2015Updated 11 years ago
- ☆12Apr 3, 2023Updated 3 years ago
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆12Nov 8, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆128Oct 15, 2025Updated 8 months ago
- Official implementation of "Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent".☆23May 23, 2025Updated last year
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆37Sep 15, 2023Updated 2 years ago
- ☆15Dec 5, 2024Updated last year
- A DAG processor and compiler for a tree-based spatial datapath.☆16Aug 24, 2022Updated 3 years ago
- ☆88Apr 18, 2025Updated last year
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆15Aug 31, 2023Updated 2 years ago
- ☆39Feb 28, 2020Updated 6 years ago
- Domain Agnostic Fourier Neural Operators (DAFNO)☆20Sep 3, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"☆76Jul 8, 2025Updated 11 months ago
- ☆21Feb 5, 2024Updated 2 years ago
- Structured Binary Neural Networks for Image Recognition☆18Nov 18, 2021Updated 4 years ago
- The code for the Network Binarization via Contrastive Learning, which has been accepted to ECCV 2022.☆14Jul 13, 2022Updated 3 years ago
- ucas hpc course code☆15May 24, 2023Updated 3 years ago
- ☆20Jan 4, 2024Updated 2 years ago
- PyTorch extension enabling direct access to cuDNN-accelerated C++ convolution functions.☆13Mar 14, 2021Updated 5 years ago
- My solution code to parallel architecture and programming Spring 2016☆12Aug 15, 2016Updated 9 years ago
- About Code release for “DeepLag: Discovering Deep Lagrangian Dynamics for Intuitive Fluid Prediction” (NeurIPS 2024), https://arxiv.org/a…☆23Oct 31, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆38Jun 4, 2026Updated 3 weeks ago
- ☆25Apr 13, 2025Updated last year
- Curated list of methods that focuses on improving the efficiency of diffusion models☆43Jul 9, 2024Updated last year
- 🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)☆10Jun 22, 2026Updated last week
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆68Mar 25, 2025Updated last year
- Official repo for vidar and vidarc: video foundation model for robotics.☆42Dec 22, 2025Updated 6 months ago
- Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators☆124Oct 26, 2022Updated 3 years ago
- ☆31Mar 24, 2025Updated last year
- ☆22Jun 7, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- First Latency-Aware Competitive LLM Agent Benchmark☆29Jun 3, 2025Updated last year
- sast2022-pytorch-training☆11Jul 21, 2022Updated 3 years ago
- Container Traits for Modern C++☆29Oct 11, 2020Updated 5 years ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Feb 17, 2025Updated last year
- ☆39Updated this week
- Video stabilization using IMU motion data from internal or external logs☆22Feb 4, 2022Updated 4 years ago
- FastTree 2: Approximately-Maximum-Likelihood Trees for Large Alignments☆39Apr 30, 2026Updated 2 months ago