quic / efficient-transformers
This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficiently on Qualcomm Cloud AI 100 accelerators.
☆50Updated this week
Related projects ⓘ
Alternatives and complementary repositories for efficient-transformers
- Model compression for ONNX☆75Updated this week
- ☆55Updated 5 months ago
- Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high …☆55Updated 3 weeks ago
- ☆47Updated 2 months ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆63Updated 2 years ago
- ☆24Updated 7 months ago
- Customized matrix multiplication kernels☆53Updated 2 years ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆50Updated this week
- Notes and artifacts from the ONNX steering committee☆25Updated last week
- Home for OctoML PyTorch Profiler☆107Updated last year
- Open Source Projects from Pallas Lab☆20Updated 3 years ago
- Applied AI experiments and examples for PyTorch☆166Updated 3 weeks ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆90Updated 4 months ago
- ☆153Updated this week
- ☆30Updated 5 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆60Updated 8 months ago
- ☆156Updated last year
- GEMM and Winograd based convolutions using CUTLASS☆25Updated 4 years ago
- A block oriented training approach for inference time optimization.☆30Updated 3 months ago
- Cataloging released Triton kernels.☆134Updated 2 months ago
- pytorch-profiler☆50Updated last year
- ☆67Updated last year
- llama INT4 cuda inference with AWQ☆48Updated 4 months ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆59Updated 6 years ago
- Fast sparse deep learning on CPUs☆51Updated 2 years ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆38Updated 10 months ago
- MLPerf™ logging library☆30Updated this week
- ☆18Updated 2 years ago
- Nsight Systems in Docker☆17Updated 11 months ago
- Training neural networks in TensorFlow 2.0 with 5x less memory☆129Updated 2 years ago