quic / efficient-transformersLinks

This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficiently on Qualcomm Cloud AI 100 accelerators.

☆84

Alternatives and similar repositories for efficient-transformers

Users that are interested in efficient-transformers are comparing it to the libraries listed below

Sorting:

NVlabs / EfficientDL
☆34Updated 5 months ago
onnx / neural-compressor
Model compression for ONNX
☆99Updated last year
google-ai-edge / ai-edge-quantizer
AI Edge Quantizer: flexible post training quantization for LiteRT models.
☆81Updated 3 weeks ago
meta-pytorch / tokenizers
C++ implementations for various tokenizers (sentencepiece, tiktoken etc).
☆43Updated this week
Qualcomm-AI-research / FP8-quantization
☆166Updated 2 years ago
mlcommons / mobile_app_open
Mobile App Open
☆64Updated this week
leimao / Nsight-Systems-Docker-Image
Nsight Systems In Docker
☆20Updated last year
fastmachinelearning / qonnx
QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX
☆166Updated this week
Libraries-Openly-Fused / cvGPUSpeedup
A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!
☆54Updated 3 weeks ago
GATECH-EIC / SuperTickets
[ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning
☆20Updated 3 years ago
Qualcomm-AI-research / transformer-quantization
☆207Updated 4 years ago
IntelLabs / DyNAS-T
Dynamic Neural Architecture Search Toolkit
☆31Updated last year
gpu-mode / triton-tutorials
☆15Updated 6 months ago
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆159Updated 2 years ago
meta-pytorch / multipy
torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…
☆182Updated 3 months ago
SzymonOzog / FastSoftmax
Step by step implementation of a fast softmax kernel in CUDA
☆58Updated 11 months ago
AndrewZhaoLuo / TVM-Sandbox
Sandbox for TVM and playing around!
☆22Updated 3 years ago
quic / cloud-ai-sdk
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high …
☆70Updated this week
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆216Updated 2 weeks ago
LukasHedegaard / pytorch-benchmark
Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumption
☆109Updated 2 years ago
mit-han-lab / tinychat-tutorial
☆76Updated last year
ankan-ban / llama_cu_awq
llama INT4 cuda inference with AWQ
☆55Updated 10 months ago
facebookresearch / MODel_opt
Memory Optimizations for Deep Learning (ICML 2023)
☆111Updated last year
carsonpo / quadmul
a fast and customizable CUDA int4 tensor core gemm
☆14Updated last year
DeMoriarty / custom_matmul_kernels
Customized matrix multiplication kernels
☆57Updated 3 years ago
zhenhuaw-me / onnxcli
ONNX Command-Line Toolbox
☆35Updated last year
facebookresearch / DepthShrinker
[ICML 2022] "DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks", by Yonggan …
☆72Updated 3 years ago
NVlabs / HALP
☆69Updated 3 years ago
graphcore / poptorch
PyTorch interface for the IPU
☆181Updated 2 years ago
wangsiping97 / FastGEMV
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
☆123Updated last year