minyoungg / platonic-rep
☆507Updated 7 months ago
Alternatives and similar repositories for platonic-rep:
Users that are interested in platonic-rep are comparing it to the libraries listed below
- Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆398Updated 7 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆300Updated 4 months ago
- [ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)☆530Updated last year
- [ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters☆535Updated last month
- ViT Prisma is a mechanistic interpretability library for Vision Transformers (ViTs).☆214Updated last week
- ☆441Updated 8 months ago
- Annotated version of the Mamba paper☆475Updated last year
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆276Updated this week
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆506Updated 4 months ago
- Muon optimizer: +>30% sample efficiency with <3% wallclock overhead☆521Updated 2 weeks ago
- [ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation☆740Updated 5 months ago
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793☆398Updated 3 months ago
- Some preliminary explorations of Mamba's context scaling.☆212Updated last year
- Helpful tools and examples for working with flex-attention☆695Updated this week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆307Updated 3 months ago
- code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"☆777Updated last week
- Simple and Effective Masked Diffusion Language Model☆346Updated 2 weeks ago
- GPT4 based personalized ArXiv paper assistant bot☆511Updated 11 months ago
- Large Context Attention☆693Updated 2 months ago
- Sparsify transformers with SAEs and transcoders☆494Updated this week
- Training Large Language Model to Reason in a Continuous Latent Space☆985Updated last month
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆548Updated 2 months ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆270Updated 10 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆209Updated this week
- A bibliography and survey of the papers surrounding o1☆1,182Updated 4 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆186Updated 9 months ago