void-main / fastertransformer_backendLinks

☆21

Alternatives and similar repositories for fastertransformer_backend

Users that are interested in fastertransformer_backend are comparing it to the libraries listed below

Sorting:

void-main / FasterTransformer
Transformer related optimization, including BERT, GPT
☆59Updated 2 years ago
THUDM / FasterTransformer
Transformer related optimization, including BERT, GPT
☆39Updated 2 years ago
OpenPPL / ppl.llm.serving
☆129Updated 10 months ago
bytedance / effective_transformer
Running BERT without Padding
☆475Updated 3 years ago
bytedance / ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
☆479Updated last year
ninehills / llm-inference-benchmark
LLM Inference benchmark
☆428Updated last year
OpenPPL / ppl.nn.llm
☆139Updated last year
modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …
☆266Updated 2 months ago
triton-inference-server / fastertransformer_backend
☆413Updated last year
volcengine / veGiantModel
☆219Updated 2 years ago
neuralmagic / AutoFP8
☆205Updated 5 months ago
alibaba / rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
☆903Updated this week
alibaba / ChatLearn
A flexible and efficient training framework for large-scale alignment tasks
☆433Updated this week
Tencent / KsanaLLM
☆508Updated last month
alibaba / Megatron-LLaMA
Best practice for training LLaMA models in Megatron-LM
☆659Updated last year
ProjectD-AI / LLaMA-Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆68Updated 2 years ago
Ascend / AscendSpeed
☆79Updated last year
anyscale / llm-continuous-batching-benchmarks
☆121Updated last year
FlagOpen / FlagScale
FlagScale is a large model toolkit based on open-sourced projects.
☆364Updated this week
luchangli03 / export_llama_to_onnx
export llama to onnx
☆136Updated 9 months ago
alibaba / EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
☆269Updated 2 years ago
genggui001 / Megatron-DeepSpeed-Llama
☆84Updated 2 years ago
CoinCheung / gdGPT
Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.
☆98Updated last year
kwai / Megatron-Kwai
[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…
☆66Updated last year
inferflow / inferflow
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
☆248Updated last year
alipay / PainlessInferenceAcceleration
Accelerate inference without tears
☆361Updated last week
triton-inference-server / tensorrtllm_backend
The Triton TensorRT-LLM Backend
☆903Updated this week
HuangLK / transpeeder
train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism
☆224Updated last year
madsys-dev / deepseekv2-profile
☆148Updated 7 months ago
MoFHeka / LLaMA-Megatron
A LLaMA1/LLaMA12 Megatron implement.
☆28Updated last year