PolymathicAI / xValLinks
Repository for code used in the xVal paper
โ149Updated last year
Alternatives and similar repositories for xVal
Users that are interested in xVal are comparing it to the libraries listed below
Sorting:
- A MAD laboratory to improve AI architecture designs ๐งชโ137Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"โ103Updated last year
- Explorations into the recently proposed Taylor Series Linear Attentionโ100Updated last year
- ฯ-GPT: A New Approach to Autoregressive Modelsโ70Updated last year
- โ82Updated last year
- Implementation of the Llama architecture with RLHF + Q-learningโ170Updated last year
- Implementation of GateLoop Transformer in Pytorch and Jaxโ92Updated last year
- Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPTโ224Updated last year
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)โ198Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ186Updated 2 weeks ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amountโฆโ53Updated 2 years ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"โ88Updated last year
- Implementation of Infini-Transformer in Pytorchโ112Updated last year
- Understand and test language model architectures on synthetic tasks.โ252Updated 3 weeks ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindโ135Updated 3 months ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAXโ92Updated 2 years ago
- โ208Updated 3 weeks ago
- A State-Space Model with Rational Transfer Function Representation.โ83Updated last year
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-expertsโ122Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"โ247Updated 8 months ago
- โ62Updated last year
- Implementation of ๐ป Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorchโ91Updated 2 years ago
- โ167Updated 2 years ago
- some common Huggingface transformers in maximal update parametrization (ยตP)โ87Updated 3 years ago
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"โ112Updated 7 months ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptationโ46Updated 3 months ago
- โ111Updated 6 months ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.โ129Updated 2 months ago
- โ53Updated 2 years ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingโ132Updated last year