mosaicml / streaming

A Data Streaming Library for Efficient Neural Network Training

☆1,076

Related projects: ⓘ

huggingface / nanotron
Minimalistic large language model 3D-parallelism training
☆1,111Updated this week
databricks / megablocks
☆1,164Updated last week
ELS-RD / kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…
☆1,519Updated 7 months ago
mosaicml / examples
Fast and flexible reference benchmarks
☆435Updated last month
webdataset / webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
☆2,206Updated this week
pytorch / PiPPy
Pipeline Parallelism for PyTorch
☆708Updated 3 weeks ago
Lightning-AI / litdata
Transform datasets at scale. Optimize datasets for fast AI model training.
☆318Updated this week
srush / LLM-Training-Puzzles
What would you do with 1000 H100s...
☆816Updated 8 months ago
microsoft / DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
☆1,843Updated last week
pytorch / torchdynamo
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
☆994Updated 5 months ago
pytorch / ao
PyTorch native quantization and sparsity for training and inference
☆726Updated this week
microsoft / mup
maximal update parametrization (µP)
☆1,334Updated 2 months ago
Liuhong99 / Sophia
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
☆931Updated 7 months ago
pytorch / tensordict
TensorDict is a pytorch dedicated tensor container.
☆807Updated this week
facebookresearch / fairscale
PyTorch extensions for high performance and large scale training.
☆3,149Updated 2 weeks ago
NVIDIA / TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs…
☆1,811Updated this week
Lightning-AI / lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors a…
☆1,131Updated this week
huggingface / datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆1,935Updated last week
bigscience-workshop / bigscience
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
☆972Updated last month
JonasGeiping / cramming
Cramming the training of a (BERT-type) language model into limited compute.
☆1,284Updated 3 months ago
punica-ai / punica
Serving multiple LoRA finetuned LLM as one
☆946Updated 4 months ago
srush / Triton-Puzzles
Puzzles for learning Triton
☆966Updated this week
huggingface / optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
☆2,459Updated this week
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆451Updated last month
explosion / curated-transformers
🤖 A PyTorch library of curated Transformer models and their composable components
☆861Updated 5 months ago
S-LoRA / S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
☆1,698Updated 7 months ago
triton-inference-server / pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
☆715Updated last month
huggingface / optimum-quanto
A pytorch quantization backend for optimum
☆758Updated this week
PiotrNawrot / nanoT5
Fast & Simple repository for pre-training and fine-tuning T5-style models
☆957Updated 3 weeks ago
mit-han-lab / llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆2,333Updated 2 months ago