bytedance / lightseq
LightSeq: A High Performance Library for Sequence Processing and Generation
☆3,262Updated last year
Alternatives and similar repositories for lightseq:
Users that are interested in lightseq are comparing it to the libraries listed below
- a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.☆1,519Updated last week
- Transformer related optimization, including BERT, GPT☆6,123Updated last year
- PyTorch extensions for high performance and large scale training.☆3,298Updated last week
- A fast MoE impl for PyTorch☆1,699Updated 2 months ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆2,044Updated 3 weeks ago
- 🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (i…☆8,608Updated last week
- Ongoing research training transformer models at scale☆12,075Updated this week
- A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch☆8,616Updated last week
- ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab☆2,044Updated last year
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…☆2,355Updated this week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,382Updated last year
- PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.☆758Updated 2 years ago
- Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training☆1,783Updated last week
- BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.☆858Updated 3 months ago
- Example models using DeepSpeed☆6,423Updated this week
- Fast and memory-efficient exact attention☆16,929Updated this week
- Bagua Speeds up PyTorch☆879Updated 8 months ago
- A PyTorch-based knowledge distillation toolkit for natural language processing☆1,648Updated last year
- A high performance and generic framework for distributed DNN training☆3,675Updated last year
- how to optimize some algorithm in cuda.☆2,090Updated last week
- Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀☆1,684Updated 5 months ago
- Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.☆3,080Updated last year
- An annotated implementation of the Transformer paper.☆6,157Updated last year
- Foundation Architecture for (M)LLMs☆3,068Updated last year
- Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.☆3,137Updated last month
- Serve, optimize and scale PyTorch models in production☆4,309Updated this week
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆12,212Updated this week
- EasyTransfer is designed to make the development of transfer learning in NLP applications easier.☆862Updated 2 years ago
- PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT☆2,725Updated this week
- Accessible large language models via k-bit quantization for PyTorch.☆6,918Updated last week