huggingface / paper-style-guide
☆72Updated 3 years ago
Alternatives and similar repositories for paper-style-guide:
Users that are interested in paper-style-guide are comparing it to the libraries listed below
- Figures I made during my PhD in Deep Learning, for my models and for context☆79Updated 3 years ago
- Code for the anonymous submission "Cockpit: A Practical Debugging Tool for Training Deep Neural Networks"☆31Updated 4 years ago
- Implementation of the Remixer Block from the Remixer paper, in Pytorch☆35Updated 3 years ago
- Loss and accuracy go opposite ways...right?☆91Updated 4 years ago
- A small framework mimics PyTorch using CuPy or NumPy☆27Updated 3 years ago
- ☆23Updated 2 years ago
- A python library for highly configurable transformers - easing model architecture search and experimentation.☆49Updated 3 years ago
- An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain☆33Updated 4 years ago
- 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.☆82Updated 3 years ago
- a lightweight transformer library for PyTorch☆71Updated 3 years ago
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆60Updated 2 years ago
- ☆37Updated last year
- Code for Multi-Head Attention: Collaborate Instead of Concatenate☆152Updated last year
- Code for the paper "Query-Key Normalization for Transformers"☆38Updated 4 years ago
- The accompanying code for "Memory-efficient Transformers via Top-k Attention" (Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonatha…☆65Updated 3 years ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆37Updated 2 years ago
- [EMNLP'19] Summary for Transformer Understanding☆53Updated 5 years ago
- Toy implementations of some popular ML optimizers using Python/JAX☆44Updated 3 years ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 2 years ago
- Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"☆70Updated last year
- This repo is for our paper: Normalization Techniques in Training DNNs: Methodology, Analysis and Application☆84Updated 3 years ago
- Repository containing code for the paper "Meta-Learning with Sparse Experience Replay for Lifelong Language Learning".☆21Updated last year
- Code for EMNLP 2020 paper CoDIR☆41Updated 2 years ago
- Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch☆45Updated 4 years ago
- Large dataset storage format for Pytorch☆45Updated 3 years ago
- 💪 A toolkit to help search for papers from aclanthology, arXiv and dblp.☆45Updated 2 years ago
- This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron …☆32Updated last year
- Implementation of Multistream Transformers in Pytorch☆53Updated 3 years ago
- Code for paper "Can contrastive learning avoid shortcut solutions?" NeurIPS 2021.☆47Updated 2 years ago
- PDFs and Codelabs for the Efficient Deep Learning book.☆191Updated last year