ELS-RD / transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
☆1,659Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for transformer-deploy
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…☆1,535Updated 9 months ago
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.☆980Updated 3 months ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,338Updated 8 months ago
- ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.☆566Updated last year
- 🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools☆2,576Updated this week
- PyTorch extensions for high performance and large scale training.☆3,195Updated last week
- Library for 8-bit optimizers and quantization routines.☆714Updated 2 years ago
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".☆1,941Updated 7 months ago
- Parallelformers: An Efficient Model Parallelization Toolkit for Deployment☆778Updated last year
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆1,904Updated this week
- a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.☆1,484Updated last year
- Prune a model while finetuning or training.☆394Updated 2 years ago
- FastFormers - highly efficient transformer models for NLU☆701Updated 10 months ago
- ☆412Updated last year
- Fast Inference Solutions for BLOOM☆560Updated last month
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,893Updated last month
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs…☆1,979Updated this week
- Flexible components pairing 🤗 Transformers with Pytorch Lightning☆611Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆1,257Updated 4 months ago
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.☆1,011Updated 7 months ago
- Transformer related optimization, including BERT, GPT☆5,890Updated 7 months ago
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.☆742Updated this week
- Running BERT without Padding☆460Updated 2 years ago
- 🤗 Evaluate: A library for easily evaluating machine learning models and datasets.☆2,037Updated 2 months ago
- A Unified Library for Parameter-Efficient and Modular Transfer Learning☆2,581Updated 2 weeks ago
- An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/p…☆433Updated 2 years ago
- Accessible large language models via k-bit quantization for PyTorch.☆6,299Updated this week
- Sparsity-aware deep learning inference runtime for CPUs☆3,028Updated 4 months ago
- Tools to download and cleanup Common Crawl data☆971Updated last year
- Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.☆553Updated this week