HabanaAI / Megatron-DeepSpeedLinks

Intel Gaudi's Megatron DeepSpeed Large Language Models for training

☆13

Alternatives and similar repositories for Megatron-DeepSpeed

Users that are interested in Megatron-DeepSpeed are comparing it to the libraries listed below

Sorting:

deepspeedai / DeepSpeed-Kernels
☆71Updated 7 months ago
tridao / flash-attention-wheels
☆57Updated last year
vedantroy / gpu_kernels
☆27Updated last year
axonn-ai / axonn
Parallel framework for training and fine-tuning deep neural networks
☆65Updated last week
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆84Updated last year
graphcore-research / unit-scaling-demo
Unit Scaling demo and experimentation code
☆16Updated last year
SqueezeBits / QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
☆118Updated last year
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆130Updated 10 months ago
stanford-futuredata / stk
☆112Updated last year
chu-tianxiang / QuIP-for-all
QuIP quantization
☆59Updated last year
hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆120Updated 11 months ago
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆26Updated 2 years ago
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆84Updated last month
argonne-lcf / LLM-Inference-Bench
LLM-Inference-Bench
☆56Updated 3 months ago
hpcaitech / Elixir
Elixir: Train a Large Language Model on a Small GPU Cluster
☆15Updated 2 years ago
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆87Updated last year
rayleizhu / vllm-ra
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆40Updated last year
abdelfattah-lab / nitro
Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs
☆23Updated 10 months ago
opengear-project / GEAR
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆169Updated last year
PipeFusion / PipeFusion
A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters
☆52Updated last year
exists-forall / striped_attention
☆41Updated last year
casper-hansen / AutoAWQ_kernels
☆78Updated 11 months ago
cchan / tccl
extensible collectives library in triton
☆90Updated 7 months ago
Dao-AILab / grouped-latent-attention
☆130Updated 5 months ago
tile-ai / tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
☆19Updated last week
efeslab / fiddler
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
☆238Updated 11 months ago
meta-pytorch / BackendBench
How to ensure correctness and ship LLM generated kernels in PyTorch
☆107Updated last week
li-plus / flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆52Updated 3 months ago
sgl-project / tensorrt-demo
TensorRT LLM Benchmark Configuration
☆13Updated last year