Ascend / AscendSpeedLinks

☆79

Alternatives and similar repositories for AscendSpeed

Users that are interested in AscendSpeed are comparing it to the libraries listed below

Sorting:

void-main / FasterTransformer
Transformer related optimization, including BERT, GPT
☆59Updated 2 years ago
OpenPPL / ppl.llm.serving
☆129Updated 10 months ago
DeepLink-org / dlinfer
☆63Updated last week
BBuf / megatron-lm-parallel-group-playground
☆16Updated last year
feifeibear / LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
☆115Updated last year
madsys-dev / deepseekv2-profile
☆148Updated 7 months ago
THUDM / FasterTransformer
Transformer related optimization, including BERT, GPT
☆39Updated 2 years ago
zms1999 / SmartMoE
A MoE impl for PyTorch, [ATC'23] SmartMoE
☆71Updated 2 years ago
AniZpZ / AutoSmoothQuant
An easy-to-use package for implementing SmoothQuant for LLMs
☆107Updated 6 months ago
OpenPPL / ppl.nn.llm
☆139Updated last year
Oneflow-Inc / models
Models and examples built with OneFlow
☆100Updated last year
Oneflow-Inc / libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
☆407Updated 2 months ago
Rayrtfr / FasterTransformer
Transformer related optimization, including BERT, GPT
☆17Updated 2 years ago
FlagOpen / FlagAttention
A collection of memory efficient attention operators implemented in the Triton language.
☆282Updated last year
InternLM / turbomind
☆97Updated 7 months ago
volcengine / veGiantModel
☆219Updated 2 years ago
zhaochenyang20 / ModelServer
Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang
☆58Updated 11 months ago
modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …
☆266Updated 2 months ago
inferflow / inferflow
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
☆248Updated last year
anyscale / llm-continuous-batching-benchmarks
☆121Updated last year
fanshiqing / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆156Updated 2 weeks ago
FlagOpen / FlagScale
FlagScale is a large model toolkit based on open-sourced projects.
☆362Updated last week
CoinCheung / gdGPT
Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.
☆98Updated last year
bytedance / ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
☆479Updated last year
openmlsys / openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
☆132Updated 2 years ago
ByteDance-Seed / decoupleQ
A quantization algorithm for LLM
☆143Updated last year
ModelTC / awesome-lm-system
Summary of system papers/frameworks/codes/tools on training or serving large model
☆57Updated last year
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆78Updated last year
OpenBMB / BMCook
Model Compression for Big Models
☆165Updated 2 years ago
OpenBMB / cpm_kernels
☆25Updated 2 years ago