mosaicml / streaming
A Data Streaming Library for Efficient Neural Network Training
☆1,076Updated this week
Related projects: ⓘ
- Minimalistic large language model 3D-parallelism training☆1,111Updated this week
- ☆1,164Updated last week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…☆1,519Updated 7 months ago
- Fast and flexible reference benchmarks☆435Updated last month
- A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.☆2,206Updated this week
- Pipeline Parallelism for PyTorch☆708Updated 3 weeks ago
- Transform datasets at scale. Optimize datasets for fast AI model training.☆318Updated this week
- What would you do with 1000 H100s...☆816Updated 8 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆1,843Updated last week
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.☆994Updated 5 months ago
- PyTorch native quantization and sparsity for training and inference☆726Updated this week
- maximal update parametrization (µP)☆1,334Updated 2 months ago
- The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”☆931Updated 7 months ago
- TensorDict is a pytorch dedicated tensor container.☆807Updated this week
- PyTorch extensions for high performance and large scale training.☆3,149Updated 2 weeks ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs…☆1,811Updated this week
- Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors a…☆1,131Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆1,935Updated last week
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.☆972Updated last month
- Cramming the training of a (BERT-type) language model into limited compute.☆1,284Updated 3 months ago
- Serving multiple LoRA finetuned LLM as one☆946Updated 4 months ago
- Puzzles for learning Triton☆966Updated this week
- 🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools☆2,459Updated this week
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆451Updated last month
- 🤖 A PyTorch library of curated Transformer models and their composable components☆861Updated 5 months ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,698Updated 7 months ago
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.☆715Updated last month
- A pytorch quantization backend for optimum☆758Updated this week
- Fast & Simple repository for pre-training and fine-tuning T5-style models☆957Updated 3 weeks ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆2,333Updated 2 months ago