pcg-mlp / KsanaLLMLinks

☆332

Alternatives and similar repositories for KsanaLLM

Users that are interested in KsanaLLM are comparing it to the libraries listed below

Sorting:

alibaba / rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
☆777Updated 2 weeks ago
OpenPPL / ppl.llm.serving
☆127Updated 5 months ago
bytedance / ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
☆473Updated last year
modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …
☆253Updated this week
FlagOpen / FlagScale
FlagScale is a large model toolkit based on open-sourced projects.
☆280Updated this week
antgroup / glake
GLake: optimizing GPU memory management and IO transmission.
☆463Updated 2 months ago
sgl-project / sgl-learning-materials
Materials for learning SGLang
☆424Updated last week
LLMServe / DistServe
Disaggregated serving system for Large Language Models (LLMs).
☆601Updated last month
OpenPPL / ppl.nn.llm
☆138Updated last year
FlagOpen / FlagGems
FlagGems is an operator library for large language models implemented in the Triton Language.
☆546Updated this week
AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆420Updated this week
madsys-dev / deepseekv2-profile
☆137Updated 2 months ago
ninehills / llm-inference-benchmark
LLM Inference benchmark
☆419Updated 10 months ago
alibaba / EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
☆267Updated 2 years ago
void-main / FasterTransformer
Transformer related optimization, including BERT, GPT
☆59Updated last year
alibaba / ChatLearn
A flexible and efficient training framework for large-scale alignment tasks
☆364Updated this week
kwai / Megatron-Kwai
[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…
☆54Updated 10 months ago
feifeibear / long-context-attention
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
☆506Updated this week
bytedance / ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…
☆239Updated 2 weeks ago
FlagOpen / FlagPerf
FlagPerf is an open-source software platform for benchmarking AI chips.
☆332Updated last week
OpenPPL / ppl.llm.kernel.cuda
☆148Updated 4 months ago
FlagOpen / FlagAttention
A collection of memory efficient attention operators implemented in the Triton language.
☆270Updated 11 months ago
microsoft / vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
☆378Updated last month
hahnyuan / LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…
☆470Updated 8 months ago
volcengine / veScale
A PyTorch Native LLM Training Framework
☆811Updated 5 months ago
feifeibear / LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
☆100Updated last year
Jack47 / hack-SysML
The road to hack SysML and become an system expert
☆486Updated 8 months ago
alibaba / Megatron-LLaMA
Best practice for training LLaMA models in Megatron-LM
☆654Updated last year
volcengine / veGiantModel
☆217Updated last year
sail-sg / zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
☆395Updated 3 weeks ago