IST-DASLab / torch_cgxLinks
Pytorch distributed backend extension with compression support
☆16Updated 7 months ago
Alternatives and similar repositories for torch_cgx
Users that are interested in torch_cgx are comparing it to the libraries listed below
Sorting:
- extensible collectives library in triton☆90Updated 7 months ago
- Distributed MoE in a Single Kernel [NeurIPS '25]☆89Updated last month
- Collection of kernels written in Triton language☆161Updated 7 months ago
- A resilient distributed training framework☆96Updated last year
- A schedule language for large model training☆151Updated 2 months ago
- ☆158Updated last year
- ☆75Updated 3 weeks ago
- PyTorch bindings for CUTLASS grouped GEMM.☆126Updated 5 months ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆255Updated 3 weeks ago
- Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling☆13Updated last year
- [IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any inte…☆52Updated 2 years ago
- ☆76Updated 4 years ago
- Fast low-bit matmul kernels in Triton☆392Updated 2 weeks ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆222Updated 2 years ago
- ☆180Updated last year
- nnScaler: Compiling DNN models for Parallel Training☆118Updated last month
- Triton-based Symmetric Memory operators and examples☆61Updated 3 weeks ago
- ☆25Updated 2 years ago
- Github mirror of trition-lang/triton repo.☆98Updated this week
- ☆246Updated this week
- Boosting 4-bit inference kernels with 2:4 Sparsity☆85Updated last year
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆326Updated last year
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆125Updated last week
- ☆93Updated last year
- Microsoft Collective Communication Library☆66Updated 11 months ago
- ☆58Updated last year
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆42Updated 3 years ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆23Updated 5 months ago
- Stateful LLM Serving☆88Updated 7 months ago
- Latency and Memory Analysis of Transformer Models for Training and Inference☆461Updated 6 months ago