quic / efficient-transformersLinks
This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficiently on Qualcomm Cloud AI 100 accelerators.
☆68Updated this week
Alternatives and similar repositories for efficient-transformers
Users that are interested in efficient-transformers are comparing it to the libraries listed below
Sorting:
- Model compression for ONNX☆96Updated 6 months ago
- A block oriented training approach for inference time optimization.☆33Updated 9 months ago
- Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high …☆61Updated 2 weeks ago
- Nsight Systems In Docker☆20Updated last year
- ☆10Updated 3 weeks ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆127Updated this week
- ☆31Updated 11 months ago
- ☆67Updated 7 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆64Updated last week
- ☆26Updated last year
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆109Updated 10 months ago
- ☆149Updated 2 years ago
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆13Updated 4 months ago
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆43Updated last week
- NASRec Weight Sharing Neural Architecture Search for Recommender Systems☆30Updated last year
- ☆73Updated 4 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆64Updated last year
- Benchmark code for the "Online normalizer calculation for softmax" paper☆94Updated 6 years ago
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆39Updated 3 weeks ago
- [ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning☆20Updated 2 years ago
- ☆30Updated last year
- Repository for CPU Kernel Generation for LLM Inference☆26Updated last year
- ☆157Updated last year
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆41Updated last week
- Arch-Net: Model Distillation for Architecture Agnostic Model Deployment☆22Updated 3 years ago
- Home for OctoML PyTorch Profiler☆113Updated 2 years ago
- ☆71Updated 2 months ago
- ☆18Updated 2 years ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆108Updated 7 months ago