quic / efficient-transformers
This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficiently on Qualcomm Cloud AI 100 accelerators.
☆60Updated this week
Alternatives and similar repositories for efficient-transformers:
Users that are interested in efficient-transformers are comparing it to the libraries listed below
- Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high …☆56Updated 4 months ago
- ☆31Updated 9 months ago
- Model compression for ONNX☆87Updated 4 months ago
- Snapdragon Neural Processing Engine (SNPE) SDKThe Snapdragon Neural Processing Engine (SNPE) is a Qualcomm Snapdragon software accelerate…☆34Updated 2 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆64Updated 3 years ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆55Updated 5 months ago
- Home for OctoML PyTorch Profiler☆108Updated last year
- Nsight Systems In Docker☆20Updated last year
- Arch-Net: Model Distillation for Architecture Agnostic Model Deployment☆22Updated 3 years ago
- ☆149Updated last year
- ☆22Updated last year
- Awesome Quantization Paper lists with Codes☆11Updated 4 years ago
- ☆141Updated 2 years ago
- Open Source Projects from Pallas Lab☆20Updated 3 years ago
- ☆64Updated last month
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆22Updated 9 months ago
- A block oriented training approach for inference time optimization.☆32Updated 7 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆91Updated this week
- pytorch-profiler☆51Updated last year
- Fast Hadamard transform in CUDA, with a PyTorch interface☆152Updated 9 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆189Updated this week
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated last year
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆141Updated this week
- ☆202Updated 3 years ago
- Customized matrix multiplication kernels☆53Updated 3 years ago
- CUDA Matrix Multiplication Optimization☆173Updated 8 months ago
- NASRec Weight Sharing Neural Architecture Search for Recommender Systems☆29Updated last year
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆38Updated 2 weeks ago
- ☆27Updated 11 months ago
- ☆18Updated 2 years ago