BobMcDear / simsiam-pytorch
PyTorch implementation of SimSiam
☆8Updated 2 years ago
Alternatives and similar repositories for simsiam-pytorch:
Users that are interested in simsiam-pytorch are comparing it to the libraries listed below
- ☆9Updated last year
- Blog post☆16Updated last year
- Layerwise Batch Entropy Regularization☆22Updated 2 years ago
- code for the ddp tutorial☆32Updated 2 years ago
- Code for testing DCT plus Sparse (DCTpS) networks☆14Updated 3 years ago
- ☆24Updated 4 months ago
- [ICML 2024] SIRFShampoo: Structured inverse- and root-free Shampoo in PyTorch (https://arxiv.org/abs/2402.03496)☆14Updated 3 months ago
- Parallel Associative Scan for Language Models☆18Updated last year
- Scalable Computation of Hessian Diagonals☆13Updated 8 months ago
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆27Updated 4 years ago
- ☆30Updated 3 months ago
- CIFAR10 ResNets implemented in JAX+Flax☆12Updated 2 years ago
- ☆52Updated 4 months ago
- The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".☆32Updated 3 years ago
- ☆31Updated 10 months ago
- A repo based on XiLin Li's PSGD repo that extends some of the experiments.☆14Updated 4 months ago
- ☆19Updated 2 years ago
- [ICLR 2023] Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation☆12Updated last year
- HomebrewNLP in JAX flavour for maintable TPU-Training☆48Updated last year
- STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION☆16Updated 6 years ago
- General Invertible Transformations for Flow-based Generative Models☆17Updated 4 years ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆28Updated last year
- ☆49Updated 7 months ago
- Official code for the paper "Attention as a Hypernetwork"☆24Updated 8 months ago
- Code for the PAPA paper☆27Updated 2 years ago
- Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper☆80Updated 3 years ago
- Official Repository for Efficient Linear-Time Attention Transformers.☆18Updated 8 months ago
- A pytorch implementation for the LSTM experiments in the paper: Why Gradient Clipping Accelerates Training: A Theoretical Justification f…☆44Updated 5 years ago
- ☆24Updated 2 years ago