alxndrTL / othello_mambaLinks

Evaluating the Mamba architecture on the Othello game

☆48

Alternatives and similar repositories for othello_mamba

Users that are interested in othello_mamba are comparing it to the libraries listed below

Sorting:

proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆89Updated last year
epfml / DenseFormer
☆82Updated last year
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆60Updated last year
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆135Updated 11 months ago
shikaiqiu / compute-better-spent
☆61Updated last year
dvruette / barrel-rec-pytorch
☆53Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆174Updated 5 months ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆53Updated 2 years ago
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆91Updated last year
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆92Updated last year
berlino / seq_icl
☆53Updated last year
johnma2006 / candle
Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.
☆53Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆103Updated 11 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆240Updated 2 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆131Updated last month
radarFudan / mamba-minimal-jax
☆35Updated last year
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
google-deepmind / spectral_ssm
☆34Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 7 months ago
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 5 months ago
SmerkyG / gptcore
Fast modular code to create and train cutting edge LLMs
☆68Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆101Updated last year
jopetty / word-problem
Experiments on the impact of depth in transformers and SSMs.
☆38Updated last month
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆243Updated 5 months ago
google-deepmind / asyncdiloco
☆47Updated last year
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆111Updated 3 weeks ago
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆217Updated last year
cloneofsimo / min-fsdp
☆91Updated last year
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆127Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated last year