minyoungg / platonic-rep
β499Updated 6 months ago
Alternatives and similar repositories for platonic-rep:
Users that are interested in platonic-rep are comparing it to the libraries listed below
- [ICLR2025 Spotlightπ₯] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parametersβ514Updated last week
- β421Updated 7 months ago
- [ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)β475Updated 11 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"β291Updated 3 months ago
- Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden Statesβ397Updated 6 months ago
- Annotated version of the Mamba paperβ473Updated 11 months ago
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ502Updated 3 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β297Updated 2 months ago
- Some preliminary explorations of Mamba's context scaling.β213Updated last year
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"β547Updated last month
- ViT Prisma is a mechanistic interpretability library for Vision Transformers (ViTs).β205Updated this week
- Helpful tools and examples for working with flex-attentionβ647Updated this week
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793β385Updated 2 months ago
- β253Updated 5 months ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorchβ306Updated 8 months ago
- Simplified Masked Diffusion Language Modelβ273Updated 2 months ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.β398Updated 10 months ago
- code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"β730Updated this week
- Sparsify transformers with SAEs and transcodersβ464Updated this week
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAIβ273Updated 3 months ago
- Muon optimizer: +~30% sample efficiency with <3% wallclock overheadβ254Updated last week
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ118Updated 5 months ago
- A repository for research on medium sized language models.β491Updated last month
- [ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptationβ717Updated 4 months ago
- β171Updated last year
- Extracting spatial and temporal world models from LLMsβ249Updated last year
- A bibliography and survey of the papers surrounding o1β1,160Updated 3 months ago