Lightning-Universe / lightning-HivemindLinks

Lightning Training strategy for HiveMind

☆18

Alternatives and similar repositories for lightning-Hivemind

Users that are interested in lightning-Hivemind are comparing it to the libraries listed below

Sorting:

graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆46Updated last year
fw-ai / llama-cuda-graph-example
Example of applying CUDA graphs to LLaMA-v2
☆12Updated 2 years ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
pytorch / torchdistx
Torch Distributed Experimental
☆117Updated last year
deepspeedai / DeepSpeed-Kernels
☆71Updated 8 months ago
meta-pytorch / BackendBench
Ship correct and fast LLM kernels to PyTorch
☆124Updated 2 weeks ago
AnswerDotAI / cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…
☆146Updated last year
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆61Updated last week
stanford-futuredata / stk
☆113Updated last year
facebookresearch / MODel_opt
Memory Optimizations for Deep Learning (ICML 2023)
☆111Updated last year
HazyResearch / train-tk
train with kittens!
☆63Updated last year
GreenBitAI / low_bit_llama
Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs
☆110Updated last year
vllm-project / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆210Updated last week
chu-tianxiang / QuIP-for-all
QuIP quantization
☆61Updated last year
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆226Updated last year
hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆122Updated last year
yandex-research / swarm
Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"
☆147Updated last year
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆159Updated 2 years ago
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆217Updated this week
UmerHA / triton_util
Make triton easier
☆49Updated last year
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆132Updated 4 months ago
amazon-science / mxfp4-llm
Official implementation for Training LLMs with MXFP4
☆110Updated 7 months ago
IST-DASLab / Quartet
☆110Updated last week
mayank31398 / ladder-residual-inference
☆14Updated 4 months ago
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆73Updated last year
axonn-ai / axonn
Parallel framework for training and fine-tuning deep neural networks
☆70Updated 3 weeks ago
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆278Updated 2 years ago
NVIDIA-NeMo / Automodel
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
☆187Updated this week
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆93Updated this week