alxndrTL / othello_mamba
Evaluating the Mamba architecture on the Othello game
β43Updated 6 months ago
Related projects β
Alternatives and complementary repositories for othello_mamba
- A MAD laboratory to improve AI architecture designs π§ͺβ95Updated 6 months ago
- β46Updated last month
- Griffin MQA + Hawk Linear RNN Hybridβ85Updated 6 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β84Updated last week
- Some preliminary explorations of Mamba's context scaling.β191Updated 9 months ago
- β53Updated 10 months ago
- Understand and test language model architectures on synthetic tasks.β162Updated 6 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ104Updated last month
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIMβ50Updated 7 months ago
- β50Updated 6 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ113Updated 7 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β214Updated 3 months ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAXβ79Updated 9 months ago
- A State-Space Model with Rational Transfer Function Representation.β70Updated 6 months ago
- Token Omission Via Attentionβ120Updated last month
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Modeβ¦β78Updated 2 months ago
- NanoGPT-like codebase for LLM trainingβ75Updated this week
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ112Updated 2 months ago
- β23Updated 8 months ago
- β128Updated this week
- Normalized Transformer (nGPT)β66Updated this week
- β29Updated 2 months ago
- RWKV, in easy to read codeβ55Updated this week
- Implementation of GateLoop Transformer in Pytorch and Jaxβ86Updated 5 months ago
- Fast modular code to create and train cutting edge LLMsβ65Updated 6 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"β36Updated last year
- β45Updated 9 months ago
- Minimal but scalable implementation of large language models in JAXβ26Updated 2 weeks ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amountβ¦β49Updated last year
- β76Updated 7 months ago