aiha-lab / TernGEMMLinks
TernGEMM: General Matrix Multiply Library with Ternary Weights for Fast DNN Inference
☆13Updated 3 years ago
Alternatives and similar repositories for TernGEMM
Users that are interested in TernGEMM are comparing it to the libraries listed below
Sorting:
- [ICASSP'22] Integer-only Zero-shot Quantization for Efficient Speech Recognition☆33Updated 3 years ago
- In this repository, we explore model compression for transformer architectures via quantization. We specifically explore quantization awa…☆24Updated 4 years ago
- ☆15Updated last year
- ☆11Updated last year
- [ICCAD 2025] Squant☆14Updated 2 weeks ago
- This repository contains the results and code for the MLPerf™ Tiny Inference v0.7 benchmark.☆18Updated 2 years ago
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆110Updated 7 months ago
- [ICML'21 Oral] I-BERT: Integer-only BERT Quantization☆253Updated 2 years ago
- ☆157Updated last year
- ☆206Updated 3 years ago
- ☆16Updated 2 years ago
- Fast matrix multiplication for few-bit integer matrices on CPUs.☆28Updated 6 years ago
- ☆18Updated 4 years ago
- ☆10Updated 2 years ago
- Code for ICML25 Paper "Overcoming Non-monotonicity in Transducer-based Streaming Generation"☆11Updated last month
- Layer-wise Pruning of Transformer Heads for Efficient Language Modeling☆21Updated 3 years ago
- ☆61Updated last year
- ☆77Updated 5 months ago
- The accompanying code for "Exploring the limits of decoder-only models trained on public speech recognition corpora" (Ankit Gupta, George…☆19Updated 9 months ago
- The official, proof-of-concept C++ implementation of PocketNN.☆34Updated last year
- ☆28Updated 11 months ago
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer L…☆47Updated 2 years ago
- ☆15Updated 2 years ago
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆93Updated last year
- The project for speech translation☆11Updated last year
- Refactored version of https://github.com/ming024/FastSpeech2☆14Updated 3 years ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆95Updated 6 years ago
- Pytorch implementation of BiFSMNv2, TNNLS 2023☆31Updated 2 years ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆80Updated 10 months ago
- ☆20Updated last year