PolymathicAI / xValLinks
Repository for code used in the xVal paper
โ147Updated last year
Alternatives and similar repositories for xVal
Users that are interested in xVal are comparing it to the libraries listed below
Sorting:
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"โ103Updated last year
- A MAD laboratory to improve AI architecture designs ๐งชโ136Updated last year
- ฯ-GPT: A New Approach to Autoregressive Modelsโ70Updated last year
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)โ198Updated last year
- โ82Updated last year
- Implementation of the Llama architecture with RLHF + Q-learningโ170Updated 11 months ago
- Explorations into the recently proposed Taylor Series Linear Attentionโ100Updated last year
- Getting crystal-like representations with harmonic lossโ195Updated 9 months ago
- Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPTโ224Updated last year
- Implementation of GateLoop Transformer in Pytorch and Jaxโ91Updated last year
- โ62Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ185Updated 6 months ago
- Implementation of ๐ป Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorchโ91Updated 2 years ago
- Explorations into whether a transformer with RL can direct a genetic algorithm to converge fasterโ71Updated 8 months ago
- Implementation of Infini-Transformer in Pytorchโ112Updated last year
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-expertsโ121Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindโ132Updated 2 months ago
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"โ112Updated 7 months ago
- Griffin MQA + Hawk Linear RNN Hybridโ88Updated last year
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amountโฆโ53Updated 2 years ago
- some common Huggingface transformers in maximal update parametrization (ยตP)โ87Updated 3 years ago
- Latent Diffusion Language Modelsโ70Updated 2 years ago
- โ62Updated 2 years ago
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the publicโ148Updated this week
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingโ132Updated last year
- Understand and test language model architectures on synthetic tasks.โ249Updated last week
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resourcesโ150Updated 3 months ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.โ124Updated 2 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"โ244Updated 7 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"โ87Updated last year