Tencent/TurboTransformers

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Tencent/TurboTransformers)

Tencent / TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

☆1,546

Alternatives and similar repositories for TurboTransformers

Users that are interested in TurboTransformers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

bytedance / lightseq
View on GitHub
LightSeq: A High Performance Library for Sequence Processing and Generation
☆3,296May 16, 2023Updated 3 years ago
bytedance / effective_transformer
View on GitHub
Running BERT without Padding
☆479Mar 18, 2022Updated 4 years ago
NVIDIA / FasterTransformer
View on GitHub
Transformer related optimization, including BERT, GPT
☆6,439Mar 27, 2024Updated 2 years ago
zhihu / cuBERT
View on GitHub
Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL
☆547Nov 18, 2020Updated 5 years ago
huawei-noah / Pretrained-Language-Model
View on GitHub
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
☆3,162Jan 22, 2024Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
autoliuweijie / FastBERT
View on GitHub
The score code of FastBERT (ACL2020)
☆605Oct 29, 2021Updated 4 years ago
bytedance / ByteTransformer
View on GitHub
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
☆479Mar 15, 2024Updated 2 years ago
airaria / TextBrewer
View on GitHub
A PyTorch-based knowledge distillation toolkit for natural language processing
☆1,704May 8, 2023Updated 3 years ago
microsoft / fastformers
View on GitHub
FastFormers - highly efficient transformer models for NLU
☆706Mar 21, 2025Updated last year
ELS-RD / transformer-deploy
View on GitHub
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
☆1,688Oct 23, 2024Updated last year
bytedance / byteps
View on GitHub
A high performance and generic framework for distributed DNN training
☆3,718Oct 3, 2023Updated 2 years ago
Tencent / Forward
View on GitHub
A library for high performance deep learning inference on NVIDIA GPUs.
☆556Jan 29, 2022Updated 4 years ago
brightmart / albert_zh
View on GitHub
A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型
☆3,979Nov 21, 2022Updated 3 years ago
NVIDIA / Megatron-LM
View on GitHub
Ongoing research training transformer models at scale
☆17,108Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
microsoft / nnfusion
View on GitHub
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆1,002Sep 19, 2024Updated last year
mit-han-lab / lite-transformer
View on GitHub
[ICLR 2020] Lite Transformer with Long-Short Range Attention
☆609Jul 11, 2024Updated 2 years ago
triton-inference-server / server
View on GitHub
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
☆10,853Updated this week
ymcui / Chinese-ELECTRA
View on GitHub
Pre-trained Chinese ELECTRA（中文ELECTRA预训练模型）
☆1,433Apr 19, 2026Updated 3 months ago
NVIDIA / DeepLearningExamples
View on GitHub
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enter…
☆14,831Aug 12, 2024Updated last year
dbiir / UER-py
View on GitHub
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
☆3,110May 9, 2024Updated 2 years ago
alibaba / BladeDISC
View on GitHub
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
☆932Dec 30, 2024Updated last year
ShannonAI / service-streamer
View on GitHub
Boosting your Web Services of Deep Learning Applications.
☆1,242May 13, 2021Updated 5 years ago
NVIDIA / TensorRT
View on GitHub
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source compone…
☆13,164Jul 7, 2026Updated last week
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
pytorch / TensorRT
View on GitHub
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
☆2,977Updated this week
facebookresearch / fairscale
View on GitHub
PyTorch extensions for high performance and large scale training.
☆3,410Apr 26, 2025Updated last year
brightmart / roberta_zh
View on GitHub
RoBERTa中文预训练模型: RoBERTa for Chinese
☆2,793Jul 22, 2024Updated last year
Oneflow-Inc / oneflow
View on GitHub
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
☆9,411Dec 4, 2025Updated 7 months ago
Tencent / TNN
View on GitHub
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is …
☆4,639May 9, 2025Updated last year
deepspeedai / DeepSpeed
View on GitHub
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
☆42,747Updated this week
zihangdai / xlnet
View on GitHub
XLNet: Generalized Autoregressive Pretraining for Language Understanding
☆6,180May 28, 2023Updated 3 years ago
Tencent / PatrickStar
View on GitHub
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.
☆773Nov 18, 2025Updated 8 months ago
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,434Updated this week
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
jina-ai / clip-as-service
View on GitHub
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
☆12,829Jan 23, 2024Updated 2 years ago
google-research / text-to-text-transfer-transformer
View on GitHub
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
☆6,539Jul 8, 2026Updated last week
flexflow / flexflow-train
View on GitHub
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
☆1,895Jul 1, 2026Updated 2 weeks ago
laekov / fastmoe
View on GitHub
A fast MoE impl for PyTorch
☆1,855Feb 10, 2025Updated last year
OpenPPL / ppl.nn
View on GitHub
A primitive library for neural network
☆1,367Nov 24, 2024Updated last year
horovod / horovod
View on GitHub
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
☆14,695Jun 20, 2026Updated last month
apache / tvm
View on GitHub
Open Machine Learning Compiler Framework
☆13,588Updated this week