zhihanyang2022 / alpha-zero
Minimal AlphaZero in PyTorch, trained on Connect4 on a 6x6 board.
☆12Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for alpha-zero
- Pytorch implementation of StyleGAN2 in my style☆11Updated last year
- ☆13Updated 2 years ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆36Updated 2 years ago
- git for paper "CTAB-GAN: Effective Table Data Synthesizing"☆13Updated 2 years ago
- Official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML)☆39Updated 2 years ago
- Visualising Losses in Deep Neural Networks☆15Updated 4 months ago
- Implementation of numerous Vision Transformers in Google's JAX and Flax.☆20Updated 2 years ago
- ☆29Updated 2 years ago
- PyTorch implementation of moe, which stands for mixture of experts☆32Updated 3 years ago
- Implementation of Metaformer, but in an autoregressive manner☆23Updated 2 years ago
- PyTorch reimplementation of the paper "HyperMixer: An MLP-based Green AI Alternative to Transformers" [arXiv 2022].☆17Updated 2 years ago
- ☆26Updated last year
- Position Prediction as an Effective Pretraining Strategy☆8Updated last year
- ☆21Updated 3 years ago
- Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.☆22Updated last year
- Solution of Kaggle competition: Feedback Prize - Evaluating Student Writing☆17Updated 2 years ago
- Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch☆72Updated last year
- Bag of MLP☆20Updated 3 years ago
- JAX implementation of Graph Attention Networks☆13Updated 2 years ago
- Official Pytorch Implementation of Unsupervised Representation Learning for Binary Networks by Joint Classifier Training (CVPR 2022)☆10Updated 2 years ago
- Directed masked autoencoders☆14Updated last year
- LoRA fine-tuned Stable Diffusion Deployment☆31Updated last year
- Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…☆50Updated 2 years ago
- this is for fun, ain't it grand!☆12Updated 7 months ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 2 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆47Updated 2 years ago
- [COLM 2024] Early Weight Averaging meets High Learning Rates for LLM Pre-training☆13Updated last month
- DCutMix official repo☆10Updated last year
- ☆73Updated 2 years ago