renan-cunha / BatchNormalizationLinks
Implementation of Batch Normalization in Numpy and Reproduction of results on MNIST
☆24Updated 3 years ago
Alternatives and similar repositories for BatchNormalization
Users that are interested in BatchNormalization are comparing it to the libraries listed below
Sorting:
- [ICLR 2022] "Deep AutoAugment" by Yu Zheng, Zhi Zhang, Shen Yan, Mi Zhang☆65Updated last year
- A simple program to calculate and visualize the FLOPs and Parameters of Pytorch models, with handy CLI and easy-to-use Python API.☆132Updated last year
- (Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …☆189Updated last year
- Implementation of "Attention Is Off By One" by Evan Miller☆198Updated 2 years ago
- Implementation of Forward Forward Network proposed by Hinton in NIPS 2022.☆170Updated 2 years ago
- Tutorial for how to build BERT from scratch☆102Updated last year
- several types of attention modules written in PyTorch for learning purposes☆53Updated last month
- PyTorch implementation of moe, which stands for mixture of experts☆52Updated 4 years ago
- A bag of tricks to speed up your deep learning process☆163Updated last year
- [ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models☆73Updated last year
- Implementation of Conv-based and Vit-based networks designed for CIFAR.☆70Updated 3 years ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆188Updated 2 months ago
- PyTorch implementation of "Effective Approaches to Attention-based Neural Machine Translation" using scheduled sampling to improve the pa…☆20Updated 7 years ago
- code for the ddp tutorial☆32Updated 3 years ago
- The record of what I‘ve been through. Now moved to Notion. See link below☆103Updated last year
- Reproduction of DeepSeek-R1☆242Updated 9 months ago
- Simple and easy to understand PyTorch implementation of Vision Transformer (ViT) from scratch, with detailed steps. Tested on common data…☆153Updated 5 months ago
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time☆505Updated last year
- This resposity maintains a collection of important papers on knowledge distillation (awesome-knowledge-distillation)).☆82Updated 10 months ago
- A curated list of Early Exiting papers, benchmarks, and misc.☆120Updated 2 years ago
- Fine-tuning Vision Transformers on various classification datasets☆114Updated last year
- Notes on quantization in neural networks☆117Updated 2 years ago
- ☆13Updated 2 years ago
- Official code for Group-Transformer (Scale down Transformer by Grouping Features for a Lightweight Character-level Language Model, COLING…☆27Updated 5 years ago
- PyTorch, PyTorch Lightning framework for trying knowledge distillation in image classification problems☆32Updated last year
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆74Updated 2 years ago
- PDFs and Codelabs for the Efficient Deep Learning book.☆204Updated 2 years ago
- Code release for "Dropout Reduces Underfitting"☆316Updated 2 years ago
- [BMVC 2022] Official repository for "How to Train Vision Transformer on Small-scale Datasets?"☆167Updated 2 years ago
- Root Mean Square Layer Normalization☆261Updated 2 years ago