HabanaAI / Megatron-DeepSpeedLinks
Intel Gaudi's Megatron DeepSpeed Large Language Models for training
☆16Updated last year
Alternatives and similar repositories for Megatron-DeepSpeed
Users that are interested in Megatron-DeepSpeed are comparing it to the libraries listed below
Sorting:
- ☆71Updated 9 months ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆124Updated last year
- ☆27Updated 2 years ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆63Updated 6 months ago
- ☆61Updated 2 years ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆136Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.☆92Updated 3 months ago
- Parallel framework for training and fine-tuning deep neural networks☆70Updated 2 months ago
- LLM-Inference-Bench☆56Updated 6 months ago
- ☆78Updated last year
- Efficient Compute-Communication Overlap for Distributed LLM Inference☆68Updated 2 months ago
- LLM Serving Performance Evaluation Harness☆82Updated 10 months ago
- Explore training for quantized models☆26Updated 6 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆91Updated last year
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Updated this week
- Example of applying CUDA graphs to LLaMA-v2☆12Updated 2 years ago
- ☆115Updated last year
- ☆96Updated 9 months ago
- ☆56Updated this week
- PyTorch bindings for CUTLASS grouped GEMM.☆140Updated 7 months ago
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference☆69Updated last year
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆40Updated 5 months ago
- ☆124Updated last year
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆46Updated 7 months ago
- ☆56Updated last year
- Unit Scaling demo and experimentation code☆16Updated last year
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆59Updated 2 months ago
- ☆85Updated 11 months ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆118Updated last year
- LLM checkpointing for DeepSpeed/Megatron☆23Updated last month