HFAiLab / pytorch_distributed
The test of different distributed-training methods on High-Flyer AIHPC
☆24Updated 2 years ago
Alternatives and similar repositories for pytorch_distributed:
Users that are interested in pytorch_distributed are comparing it to the libraries listed below
- ☆11Updated last year
- Benchmark tests supporting the TiledCUDA library.☆16Updated 5 months ago
- differentiable top-k operator☆21Updated 3 months ago
- ☆30Updated 11 months ago
- ☆78Updated last year
- ☆16Updated last year
- An object detection codebase based on MegEngine.☆28Updated 2 years ago
- A simple calculation for LLM MFU.☆36Updated last month
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆15Updated this week
- Datasets, Transforms and Models specific to Computer Vision☆85Updated last year
- OneFlow Serving☆20Updated 2 weeks ago
- study of cutlass☆21Updated 5 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated this week
- ☆67Updated this week
- ☆11Updated this week
- ☆12Updated 2 years ago
- An external memory allocator example for PyTorch.☆14Updated 3 years ago
- PyTorch Dataset Rank Dataset☆42Updated 4 years ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- A small framework mimics PyTorch using CuPy or NumPy☆27Updated 3 years ago
- ☆22Updated last year
- ☆19Updated 2 years ago
- TensorRT LLM Benchmark Configuration☆13Updated 9 months ago
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 3 years ago
- Odysseus: Playground of LLM Sequence Parallelism☆68Updated 10 months ago
- GPTQ inference TVM kernel☆38Updated last year
- SGEMM optimization with cuda step by step☆18Updated last year
- 模型压缩的小白入门教程☆22Updated 9 months ago
- A MoE impl for PyTorch, [ATC'23] SmartMoE☆62Updated last year
- [CVPR-2023] Towards Any Structural Pruning☆16Updated 2 years ago