huggingface / picotron_tutorialLinks
β224Updated last week
Alternatives and similar repositories for picotron_tutorial
Users that are interested in picotron_tutorial are comparing it to the libraries listed below
Sorting:
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β313Updated last month
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β271Updated last week
- An extension of the nanoGPT repository for training small MOE models.β215Updated 8 months ago
- Load compute kernels from the Hubβ337Updated last week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ196Updated 6 months ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.β316Updated this week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β174Updated 5 months ago
- ring-attention experimentsβ160Updated last year
- Best practices & guides on how to write distributed pytorch training codeβ543Updated last month
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β347Updated 7 months ago
- β317Updated this week
- Normalized Transformer (nGPT)β194Updated last year
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β360Updated 11 months ago
- π· Build compute kernelsβ190Updated this week
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face supportβ187Updated last week
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"β561Updated last month
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)β254Updated last week
- HuggingFace conversion and training library for Megatron-based modelsβ228Updated this week
- Efficient LLM Inference over Long Sequencesβ392Updated 5 months ago
- Code for studying the super weight in LLMβ121Updated last year
- β222Updated 11 months ago
- Memory optimized Mixture of Expertsβ69Updated 4 months ago
- PyTorch building blocks for the OLMo ecosystemβ482Updated this week
- Simple & Scalable Pretraining for Neural Architecture Researchβ302Updated last month
- β91Updated last year
- β917Updated last month
- β177Updated last year
- β546Updated last year
- Understand and test language model architectures on synthetic tasks.β240Updated 2 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top ofβ¦β146Updated last year