huggingface / picotron_tutorial
β133Updated this week
Alternatives and similar repositories for picotron_tutorial:
Users that are interested in picotron_tutorial are comparing it to the libraries listed below
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β221Updated this week
- β75Updated 7 months ago
- ring-attention experimentsβ123Updated 4 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β95Updated 3 months ago
- πΎ OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.β186Updated last week
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top ofβ¦β116Updated 6 months ago
- Normalized Transformer (nGPT)β152Updated 3 months ago
- Code for studying the super weight in LLMβ79Updated 2 months ago
- Prune transformer layersβ67Updated 8 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ122Updated 10 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.β167Updated last month
- Language models scale reliably with over-training and on downstream tasksβ96Updated 10 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β295Updated 2 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β215Updated 2 weeks ago
- Understand and test language model architectures on synthetic tasks.β181Updated last month
- β141Updated last year
- β192Updated 2 months ago
- LLM KV cache compression made easyβ394Updated this week
- Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorchβ153Updated last month
- code for training & evaluating Contextual Document Embedding modelsβ173Updated last month
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ115Updated 2 months ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".β160Updated 2 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"β145Updated 7 months ago
- Efficient LLM Inference over Long Sequencesβ357Updated this week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β187Updated this week
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Dayβ255Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β270Updated last week
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β220Updated 2 months ago
- This repository contains the experimental PyTorch native float8 training UXβ221Updated 6 months ago
- Minimalistic 4D-parallelism distributed training framework for education purposeβ722Updated last week