bytedance / effective_transformerLinks

Running BERT without Padding

☆475

Alternatives and similar repositories for effective_transformer

Users that are interested in effective_transformer are comparing it to the libraries listed below

Sorting:

bytedance / ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
☆479Updated last year
void-main / FasterTransformer
Transformer related optimization, including BERT, GPT
☆59Updated 2 years ago
volcengine / veGiantModel
☆219Updated 2 years ago
alibaba / EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
☆269Updated 2 years ago
Oneflow-Inc / DLPerf
DeepLearning Framework Performance Profiling Toolkit
☆292Updated 3 years ago
triton-inference-server / fastertransformer_backend
☆413Updated last year
Oneflow-Inc / libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
☆407Updated 2 months ago
sail-sg / zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
☆433Updated 5 months ago
Azure / MS-AMP
Microsoft Automatic Mixed Precision Library
☆626Updated last year
Tencent / TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
☆1,534Updated 3 months ago
YellowOldOdd / SDBI
Simple Dynamic Batching Inference
☆145Updated 3 years ago
FlagOpen / FlagAttention
A collection of memory efficient attention operators implemented in the Triton language.
☆282Updated last year
alibaba / Megatron-LLaMA
Best practice for training LLaMA models in Megatron-LM
☆659Updated last year
fanshiqing / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆156Updated 2 weeks ago
Oneflow-Inc / OneFlow-Benchmark
OneFlow models for benchmarking.
☆104Updated last year
OpenPPL / ppl.llm.serving
☆129Updated 10 months ago
THUDM / FasterTransformer
Transformer related optimization, including BERT, GPT
☆39Updated 2 years ago
anyscale / llm-continuous-batching-benchmarks
☆121Updated last year
NetEase-FuXi / EET
Easy and Efficient Transformer : Scalable Inference Solution For Large NLP model
☆265Updated 10 months ago
OpenPPL / ppl.nn.llm
☆139Updated last year
zhihu / cuBERT
Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL
☆546Updated 4 years ago
alibaba / BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
☆901Updated 9 months ago
qsyao / cudaBERT
A Fast Muti-processing BERT-Inference System
☆101Updated 3 years ago
cli99 / llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
☆460Updated 6 months ago
DeepRec-AI / HybridBackend
A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster
☆159Updated last year
Oneflow-Inc / oneflow-documentation
oneflow documentation
☆69Updated last year
microsoft / nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆994Updated last year
void-main / fastertransformer_backend
☆21Updated 2 years ago
pytorch / PiPPy
Pipeline Parallelism for PyTorch
☆780Updated last year
Ascend / AscendSpeed
☆79Updated last year