Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models π
β1,688Oct 23, 2024Updated last year
Alternatives and similar repositories for transformer-deploy
Users that are interested in transformer-deploy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabβ¦β1,585Jan 28, 2026Updated 2 months ago
- β‘ boost inference speed of T5 models by 5x & reduce the model size by 3x.β590Apr 24, 2023Updated 2 years ago
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,354Apr 2, 2026Updated last week
- Transformer related optimization, including BERT, GPTβ6,412Mar 27, 2024Updated 2 years ago
- The Triton Inference Server provides an optimized cloud and edge inferencing solution.β10,533Updated this week
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- LightSeq: A High Performance Library for Sequence Processing and Generationβ3,300May 16, 2023Updated 2 years ago
- a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.β1,545Jul 18, 2025Updated 8 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,107Jun 30, 2025Updated 9 months ago
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nβ¦β4,716Apr 7, 2026Updated last week
- Parallelformers: An Efficient Model Parallelization Toolkit for Deploymentβ789Apr 24, 2023Updated 2 years ago
- Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining theβ¦β2,094Aug 15, 2024Updated last year
- FastFormers - highly efficient transformer models for NLUβ709Mar 21, 2025Updated last year
- Accessible large language models via k-bit quantization for PyTorch.β8,107Updated this week
- skweak: A software toolkit for weak supervision applied to NLP tasksβ926Sep 2, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- State-of-the-Art Text Embeddingsβ18,534Updated this week
- NL-Augmenter π¦ β π A Collaborative Repository of Natural Language Transformationsβ787May 19, 2024Updated last year
- Efficient few-shot learning with Sentence Transformersβ2,710Apr 2, 2026Updated last week
- Serve, optimize and scale PyTorch models in productionβ4,360Aug 6, 2025Updated 8 months ago
- Sparsity-aware deep learning inference runtime for CPUsβ3,163Jun 2, 2025Updated 10 months ago
- Large Language Model Text Generation Inferenceβ10,830Mar 21, 2026Updated 3 weeks ago
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasetsβ4,925Apr 6, 2026Updated last week
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)β4,743Jan 8, 2024Updated 2 years ago
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,596Apr 2, 2026Updated last week
- Serverless GPU API endpoints on Runpod - Bonus Credits β’ AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- OSLO: Open Source framework for Large-scale model Optimizationβ309Aug 25, 2022Updated 3 years ago
- PyTorch extensions for high performance and large scale training.β3,405Apr 26, 2025Updated 11 months ago
- PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRTβ2,963Updated this week
- A collection of libraries to optimise AI model performancesβ8,349Jul 22, 2024Updated last year
- Data augmentation for NLPβ4,656Jun 24, 2024Updated last year
- Library for 8-bit optimizers and quantization routines.β779Aug 18, 2022Updated 3 years ago
- Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.β1,754Dec 20, 2023Updated 2 years ago
- Fast inference engine for Transformer modelsβ4,417Feb 4, 2026Updated 2 months ago
- A Unified Library for Parameter-Efficient and Modular Transfer Learningβ2,810Mar 21, 2026Updated 3 weeks ago
- Simple, predictable pricing with DigitalOcean hosting β’ AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ22,086Jan 23, 2026Updated 2 months ago
- Prune a model while finetuning or training.β406Jun 21, 2022Updated 3 years ago
- Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conveβ¦β4,239Aug 25, 2025Updated 7 months ago
- Fast and memory-efficient exact attentionβ23,185Apr 6, 2026Updated last week
- An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/pβ¦β433Aug 17, 2022Updated 3 years ago
- Foundation Architecture for (M)LLMsβ3,135Apr 11, 2024Updated 2 years ago
- Running large language models on a single GPU for throughput-oriented scenarios.β9,375Oct 28, 2024Updated last year