gnovack / distributed-training-and-deepspeedLinks
☆17Updated 2 years ago
Alternatives and similar repositories for distributed-training-and-deepspeed
Users that are interested in distributed-training-and-deepspeed are comparing it to the libraries listed below
Sorting:
- Various transformers for FSDP research☆37Updated 2 years ago
- ☆114Updated last year
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆140Updated last year
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆207Updated this week
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆87Updated last year
- ☆19Updated 2 years ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆256Updated last year
- A place to store reusable transformer components of my own creation or found on the interwebs☆59Updated last week
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆95Updated last year
- experiments with inference on llama☆104Updated last year
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆195Updated last week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆137Updated last year
- ☆83Updated last year
- Experiment of using Tangent to autodiff triton☆80Updated last year
- Torch Distributed Experimental☆117Updated last year
- Supercharge huggingface transformers with model parallelism.☆77Updated 2 weeks ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆61Updated 10 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆78Updated last year
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆72Updated last year
- ML/DL Math and Method notes☆63Updated last year
- Multipack distributed sampler for fast padding-free training of LLMs☆199Updated last year
- ☆162Updated last year
- ☆39Updated last year
- This is a repo covers ai research papers pseudocodes☆14Updated 2 years ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆268Updated last year
- ☆48Updated 11 months ago
- minimal pytorch implementation of bm25 (with sparse tensors)☆104Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- ☆120Updated last year
- This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Data…☆27Updated 2 years ago