bytedance / lightseqLinks

LightSeq: A High Performance Library for Sequence Processing and Generation

☆3,285

Alternatives and similar repositories for lightseq

Users that are interested in lightseq are comparing it to the libraries listed below

Sorting:

Tencent / TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
☆1,528Updated 2 weeks ago
NVIDIA / FasterTransformer
Transformer related optimization, including BERT, GPT
☆6,261Updated last year
huawei-noah / Pretrained-Language-Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
☆3,118Updated last year
laekov / fastmoe
A fast MoE impl for PyTorch
☆1,766Updated 5 months ago
deepspeedai / Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆2,123Updated 2 weeks ago
OpenPPL / ppl.nn
A primitive library for neural network
☆1,345Updated 8 months ago
Tencent / PatrickStar
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.
☆763Updated 2 years ago
facebookresearch / fairscale
PyTorch extensions for high performance and large scale training.
☆3,350Updated 3 months ago
airaria / TextBrewer
A PyTorch-based knowledge distillation toolkit for natural language processing
☆1,667Updated 2 years ago
alibaba / AliceMind
ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
☆2,045Updated last year
BaguaSys / bagua
Bagua Speeds up PyTorch
☆882Updated last year
NVIDIA / TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…
☆2,587Updated this week
godweiyang / NN-CUDA-Example
Several simple examples for popular neural network toolkits calling custom CUDA operators.
☆1,488Updated 4 years ago
bigscience-workshop / Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆1,405Updated last year
NVIDIA / Megatron-LM
Ongoing research training transformer models at scale
☆13,010Updated this week
pytorch / TensorRT
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
☆2,818Updated this week
alibaba / EasyNLP
EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit
☆2,158Updated 8 months ago
alibaba / EasyTransfer
EasyTransfer is designed to make the development of transfer learning in NLP applications easier.
☆862Updated 2 years ago
ELS-RD / transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
☆1,688Updated 9 months ago
mli / transformers-benchmarks
real Transformer TeraFLOPS on various GPUs
☆911Updated last year
bytedance / byteps
A high performance and generic framework for distributed DNN training
☆3,686Updated last year
alibaba / BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
☆886Updated 7 months ago
mindspore-ai / mindspore
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
☆4,568Updated last year
ShannonAI / service-streamer
Boosting your Web Services of Deep Learning Applications.
☆1,242Updated 4 years ago
huawei-noah / bolt
Bolt is a deep learning library with high performance and heterogeneous flexibility.
☆953Updated 3 months ago
alpa-projects / alpa
Training and serving large-scale neural networks with auto parallelization.
☆3,143Updated last year
zhihu / cuBERT
Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL
☆544Updated 4 years ago
BBuf / how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
☆2,345Updated this week
flexflow / flexflow-train
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
☆1,809Updated this week
facebookincubator / AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…
☆4,662Updated last week