Lightning-Universe / lightning-HivemindLinks
Lightning Training strategy for HiveMind
☆18Updated last month
Alternatives and similar repositories for lightning-Hivemind
Users that are interested in lightning-Hivemind are comparing it to the libraries listed below
Sorting:
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Updated last year
- Example of applying CUDA graphs to LLaMA-v2☆12Updated 2 years ago
- Experiment of using Tangent to autodiff triton☆80Updated last year
- Torch Distributed Experimental☆117Updated last year
- ☆71Updated 8 months ago
- Ship correct and fast LLM kernels to PyTorch☆124Updated 2 weeks ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆146Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!☆61Updated last week
- ☆113Updated last year
- Memory Optimizations for Deep Learning (ICML 2023)☆111Updated last year
- train with kittens!☆63Updated last year
- Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs☆110Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on disk☆210Updated last week
- QuIP quantization☆61Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆226Updated last year
- A Python library transfers PyTorch tensors between CPU and NVMe☆122Updated last year
- Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"☆147Updated last year
- ☆159Updated 2 years ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆217Updated this week
- Make triton easier☆49Updated last year
- A library for unit scaling in PyTorch☆132Updated 4 months ago
- Official implementation for Training LLMs with MXFP4☆110Updated 7 months ago
- ☆110Updated last week
- ☆14Updated 4 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆73Updated last year
- Parallel framework for training and fine-tuning deep neural networks☆70Updated 3 weeks ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆278Updated 2 years ago
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support☆187Updated this week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆93Updated this week