quic / efficient-transformersLinks
This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficiently on Qualcomm Cloud AI 100 accelerators.
☆71Updated this week
Alternatives and similar repositories for efficient-transformers
Users that are interested in efficient-transformers are comparing it to the libraries listed below
Sorting:
- ☆32Updated 2 weeks ago
- ☆205Updated 3 years ago
- Model compression for ONNX☆96Updated 7 months ago
- ☆149Updated 2 years ago
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆51Updated last week
- Nsight Systems In Docker☆20Updated last year
- ☆31Updated last year
- ☆12Updated last month
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆150Updated this week
- Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high …☆61Updated last month
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆49Updated last month
- ☆157Updated last year
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆14Updated 5 months ago
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆110Updated 6 months ago
- The Triton backend for the PyTorch TorchScript models.☆152Updated last week
- A block oriented training approach for inference time optimization.☆33Updated 10 months ago
- Dynamic Neural Architecture Search Toolkit☆30Updated 6 months ago
- ☆75Updated 5 months ago
- Home for OctoML PyTorch Profiler☆113Updated 2 years ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆167Updated this week
- ☆69Updated 2 years ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆80Updated 3 weeks ago
- Snapdragon Neural Processing Engine (SNPE) SDKThe Snapdragon Neural Processing Engine (SNPE) is a Qualcomm Snapdragon software accelerate…☆34Updated 3 years ago
- Memory Optimizations for Deep Learning (ICML 2023)☆64Updated last year
- Awesome Quantization Paper lists with Codes☆11Updated 4 years ago
- ☆26Updated last year
- Benchmark code for the "Online normalizer calculation for softmax" paper☆94Updated 6 years ago
- Mobile App Open☆54Updated last week
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆180Updated 2 weeks ago