hpcaitech / ColossalAI-BenchmarkLinks

Performance benchmarking with ColossalAI

☆39

Alternatives and similar repositories for ColossalAI-Benchmark

Users that are interested in ColossalAI-Benchmark are comparing it to the libraries listed below

Sorting:

hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆120Updated 10 months ago
stanford-futuredata / stk
☆112Updated last year
RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆215Updated last year
xuqifan897 / Optimus
☆28Updated 4 years ago
ParCIS / Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆66Updated 6 months ago
NUS-HPC-AI-Lab / oh-my-server
☆30Updated 2 years ago
fanshiqing / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆154Updated last week
thu-pacman / SmartMoE-AE
ATC23 AE
☆47Updated 2 years ago
thu-pacman / FasterMoE
☆87Updated 3 years ago
zhuohan123 / terapipe
☆75Updated 4 years ago
alibaba / easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
☆80Updated 11 months ago
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆77Updated last year
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆124Updated 4 months ago
Youhe-Jiang / IJCAI2023-OptimalShardedDataParallel
[IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any inte…
☆52Updated 2 years ago
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆221Updated 2 years ago
DachengLi1 / AMP
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆41Updated 2 years ago
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆118Updated 3 weeks ago
usyd-fsalab / fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
☆265Updated 3 months ago
hpcaitech / PaLM-colossalai
Scalable PaLM implementation of PyTorch
☆188Updated 2 years ago
Victarry / PP-Schedule-Visualization
Pipeline Parallelism Emulation and Visualization
☆67Updated 4 months ago
FlagOpen / FlagAttention
A collection of memory efficient attention operators implemented in the Triton language.
☆281Updated last year
yanring / Megatron-MoE-ModelZoo
Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.
☆108Updated last week
anyscale / llm-continuous-batching-benchmarks
☆121Updated last year
sail-sg / zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
☆432Updated 5 months ago
deepspeedai / DeepSpeed-Kernels
☆72Updated 6 months ago
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆215Updated this week
cli99 / flops-profiler
pytorch-profiler
☆51Updated 2 years ago
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆158Updated 2 years ago
exists-forall / striped_attention
☆41Updated last year
LiuXiaoxuanPKU / GACT-ICML
☆42Updated 2 years ago