egochao / transformer_with_einsum
Transformer from scratch with einsum method
☆8Updated 3 years ago
Related projects: ⓘ
- Repository for the PopulAtion Parameter Averaging (PAPA) paper☆26Updated 5 months ago
- PyTorch implementation of FNet: Mixing Tokens with Fourier transforms☆25Updated 3 years ago
- We investigated corruption robustness across different architectures including Convolutional Neural Networks, Vision Transformers, and th…☆15Updated 2 years ago
- Implementation of LogAvgExp for Pytorch☆32Updated 2 years ago
- Reproducible code for Augmentation paper☆18Updated 5 years ago
- ☆17Updated last year
- ☆21Updated this week
- Official code for the paper: "Metadata Archaeology"☆18Updated last year
- ☆11Updated 2 weeks ago
- Directed masked autoencoders☆13Updated last year
- ☆48Updated 3 months ago
- Experimental implementation for a sparse-dictionary based version of the VQ-VAE2 paper☆30Updated 10 months ago
- ☆19Updated last month
- Repo reproducing experimental results in "Addressing the Topological Defects of Disentanglement"☆23Updated 2 years ago
- ☆12Updated last year
- [ICML 2024] SINGD: KFAC-like Structured Inverse-Free Natural Gradient Descent (http://arxiv.org/abs/2312.05705)☆19Updated 2 months ago
- PyTorch reimplementation of the paper "HyperMixer: An MLP-based Green AI Alternative to Transformers" [arXiv 2022].☆17Updated 2 years ago
- Official implementation of the paper "Topographic VAEs learn Equivariant Capsules"☆77Updated 2 years ago
- DiWA: Diverse Weight Averaging for Out-of-Distribution Generalization☆27Updated last year
- Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021☆25Updated 2 years ago
- An adaptive training algorithm for residual network☆14Updated 4 years ago
- Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention (CVPR 2022)☆19Updated last year
- reproduces experiments from "Grounding inductive biases in natural images: invariance stems from variations in data"☆16Updated 10 months ago
- ☆21Updated last year
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆15Updated 10 months ago
- ☆11Updated last year
- Anytime Learning At Macroscale☆9Updated 2 years ago
- This is the official implementation for Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models.☆23Updated last year
- [ICLR2024] (EvALign-ICL Benchmark) Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context …☆20Updated 6 months ago
- Energy Based Models are a quite novel technique for density estimation. In this university project I explore this new research topic and …☆15Updated 3 years ago