huggingface / picotron_tutorial
β158Updated last month
Alternatives and similar repositories for picotron_tutorial:
Users that are interested in picotron_tutorial are comparing it to the libraries listed below
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β232Updated 2 weeks ago
- PyTorch building blocks for the OLMo ecosystemβ165Updated this week
- ring-attention experimentsβ127Updated 5 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β307Updated 3 months ago
- β76Updated 8 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β100Updated 4 months ago
- Best practices & guides on how to write distributed pytorch training codeβ373Updated 3 weeks ago
- πΎ OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.β224Updated last week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ123Updated 3 months ago
- Code for studying the super weight in LLMβ91Updated 3 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top ofβ¦β122Updated 7 months ago
- Prune transformer layersβ68Updated 9 months ago
- Understand and test language model architectures on synthetic tasks.β185Updated 2 weeks ago
- Triton-based implementation of Sparse Mixture of Experts.β207Updated 3 months ago
- Normalized Transformer (nGPT)β162Updated 4 months ago
- This repository contains the experimental PyTorch native float8 training UXβ222Updated 7 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β277Updated 3 weeks ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".β164Updated 3 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).β209Updated this week
- Cataloging released Triton kernels.β204Updated 2 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"β154Updated 9 months ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β189Updated this week
- Language models scale reliably with over-training and on downstream tasksβ96Updated 11 months ago
- A MAD laboratory to improve AI architecture designs π§ͺβ108Updated 3 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.β58Updated last month
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β223Updated last month