HFAiLab / pytorch_distributedLinks

The test of different distributed-training methods on High-Flyer AIHPC

☆24

Alternatives and similar repositories for pytorch_distributed

Users that are interested in pytorch_distributed are comparing it to the libraries listed below

Sorting:

bojone / softtopk
differentiable top-k operator
☆21Updated 5 months ago
BBuf / flash-rwkv
☆31Updated last year
TiledTensor / TiledBench
Benchmark tests supporting the TiledCUDA library.
☆16Updated 7 months ago
L1aoXingyu / llm-infer-bench
☆11Updated last year
megvii-research / basedet
An object detection codebase based on MegEngine.
☆28Updated 2 years ago
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆70Updated last year
megvii-research / IntLLaMA
IntLLaMA: A fast and light quantization solution for LLaMA
☆18Updated last year
HFAiLab / BEVFormer
☆19Updated 2 years ago
microsoft / AttentionEngine
☆71Updated last month
tile-ai / TileAttention
☆39Updated this week
li-plus / flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆41Updated last month
yuzhenmao / IceFormer
Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).
☆25Updated last year
eedalong / Dpex
Distributed DataLoader For Pytorch Based On Ray
☆24Updated 3 years ago
yester31 / Cutlass_EX
study of cutlass
☆21Updated 7 months ago
Oneflow-Inc / vision
Datasets, Transforms and Models specific to Computer Vision
☆85Updated last year
cassiewilliam / cuda_op_benchmark
方便扩展的Cuda算子理解和优化框架，仅用在学习使用
☆15Updated last year
GeeeekExplorer / transformers-patch
patches for huggingface transformers to save memory
☆24Updated 3 weeks ago
ModelTC / awesome-lm-system
Summary of system papers/frameworks/codes/tools on training or serving large model
☆57Updated last year
Harry-Chen / InfMoE
Inference framework for MoE layers based on TensorRT with Python binding
☆41Updated 4 years ago
Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts
☆22Updated last year
OpenNLPLab / Transnormer
[EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer
☆60Updated last year
JerryYin777 / Cross-Layer-Attention
Self Reproduction Code of Paper "Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (MIT CSAIL)
☆16Updated last year
pangu-tech / pangu-ultra
☆57Updated 3 weeks ago
ModelTC / QLLM
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆38Updated last year
feifeibear / ChituAttention
Quantized Attention on GPU
☆44Updated 7 months ago
HFAiLab / hfai-models
HFAI deep learning models
☆148Updated 2 years ago
antgroup / OmniKV
Dynamic Context Selection for Efficient Long-Context LLMs
☆33Updated last month
zxytim / arithmetic-encoding-compression
☆11Updated 2 years ago
cli99 / flops-profiler
pytorch-profiler
☆51Updated 2 years ago
HKUNLP / efficient-attention
[EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling
☆86Updated 2 years ago