amirgholami / ai_and_memory_wall
AI and Memory Wall
☆213Updated last year
Alternatives and similar repositories for ai_and_memory_wall:
Users that are interested in ai_and_memory_wall are comparing it to the libraries listed below
- A baseline repository of Auto-Parallelism in Training Neural Networks☆143Updated 2 years ago
- ☆137Updated 8 months ago
- ☆132Updated last year
- ☆79Updated 4 months ago
- [MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration☆196Updated 2 years ago
- LLM serving cluster simulator☆94Updated 11 months ago
- nnScaler: Compiling DNN models for Parallel Training☆103Updated last month
- Synthesizer for optimal collective communication algorithms☆105Updated 11 months ago
- ☆76Updated 2 years ago
- Curated collection of papers in MoE model inference☆116Updated last month
- Automatic Schedule Exploration and Optimization Framework for Tensor Computations☆176Updated 2 years ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆102Updated 8 months ago
- ☆129Updated 8 months ago
- LLM Inference analyzer for different hardware platforms☆54Updated last week
- Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.☆61Updated last week
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆212Updated 6 months ago
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆50Updated last year
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆135Updated last year
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆107Updated 3 months ago
- Microsoft Collective Communication Library☆343Updated last year
- This repository contains integer operators on GPUs for PyTorch.☆197Updated last year
- An experimental parallel training platform☆54Updated last year
- 🔮 Execution time predictions for deep neural network training iterations across different GPUs.☆60Updated 2 years ago
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆119Updated 2 years ago
- A low-latency & high-throughput serving engine for LLMs☆330Updated last month
- ☆55Updated 9 months ago
- DeepSeek-V3/R1 inference performance simulator☆89Updated this week
- ☆57Updated 3 months ago
- ☆94Updated last year
- ☆53Updated 11 months ago