PolymathicAI / xValLinks
Repository for code used in the xVal paper
β142Updated last year
Alternatives and similar repositories for xVal
Users that are interested in xVal are comparing it to the libraries listed below
Sorting:
- A MAD laboratory to improve AI architecture designs π§ͺβ127Updated 8 months ago
- β82Updated last year
- Implementation of the Llama architecture with RLHF + Q-learningβ166Updated 6 months ago
- Explorations into the recently proposed Taylor Series Linear Attentionβ100Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β153Updated 2 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"β101Updated 8 months ago
- Implementation of GateLoop Transformer in Pytorch and Jaxβ90Updated last year
- Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPTβ220Updated last year
- Griffin MQA + Hawk Linear RNN Hybridβ88Updated last year
- Ο-GPT: A New Approach to Autoregressive Modelsβ67Updated last year
- β194Updated 3 weeks ago
- Implementation of Infini-Transformer in Pytorchβ111Updated 7 months ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amountβ¦β53Updated last year
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)β192Updated last year
- β56Updated 10 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β240Updated 2 months ago
- Implementation of π» Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorchβ89Updated last year
- Understand and test language model architectures on synthetic tasks.β222Updated last month
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ127Updated last year
- β101Updated last month
- β207Updated 8 months ago
- A State-Space Model with Rational Transfer Function Representation.β79Updated last year
- Normalized Transformer (nGPT)β187Updated 9 months ago
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"β97Updated 2 months ago
- Getting crystal-like representations with harmonic lossβ194Updated 4 months ago
- Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorchβ179Updated 2 months ago
- DeMo: Decoupled Momentum Optimizationβ190Updated 8 months ago
- β53Updated last year
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Modeβ¦β114Updated 11 months ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAXβ87Updated last year