hundredblocks / large-model-parallelismLinks
Functional local implementations of main model parallelism approaches
β95Updated 2 years ago
Alternatives and similar repositories for large-model-parallelism
Users that are interested in large-model-parallelism are comparing it to the libraries listed below
Sorting:
- β94Updated 2 years ago
- Large scale 4D parallelism pre-training for π€ transformers in Mixture of Experts *(still work in progress)*β86Updated 2 years ago
- Train very large language models in Jax.β210Updated 2 years ago
- An interactive exploration of Transformer programming.β271Updated 2 years ago
- Project 2 (Building Large Language Models) for Stanford CS324: Understanding and Developing Large Language Models (Winter 2022)β105Updated 2 years ago
- A puzzle to learn about promptingβ135Updated 2 years ago
- git extension for {collaborative, communal, continual} model developmentβ217Updated last year
- JAX implementation of the Llama 2 modelβ216Updated last year
- Automatic gradient descentβ217Updated 2 years ago
- Inference code for LLaMA models in JAXβ120Updated last year
- Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inferenceβ¦β216Updated 2 weeks ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.β83Updated 2 years ago
- Pre-train BERT from scratch, with HuggingFace. Accompanies the blog post: sidsite.com/posts/bert-from-scratchβ43Updated 8 months ago
- gzip Predicts Data-dependent Scaling Lawsβ34Updated last year
- β144Updated 2 years ago
- A lightweight PyTorch implementation of the Transformer-XL architecture proposed by Dai et al. (2019)β37Updated 2 years ago
- Resources from the EleutherAI Math Reading Groupβ54Updated 11 months ago
- β53Updated 2 years ago
- β92Updated last year
- β62Updated 2 years ago
- β68Updated last year
- HomebrewNLP in JAX flavour for maintable TPU-Trainingβ51Updated 2 years ago
- ML/DL Math and Method notesβ66Updated 2 years ago
- The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER).β¦β121Updated 2 years ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ132Updated last year
- Multi-Domain Expert Learningβ67Updated 2 years ago
- Used for adaptive human in the loop evaluation of language and embedding models.β308Updated 2 years ago
- A library for squeakily cleaning and filtering language datasets.β49Updated 2 years ago
- A case study of efficient training of large language models using commodity hardware.β68Updated 3 years ago
- Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 linesβ196Updated last year