bergen / EdgeTransformerLinks
☆22Updated 3 years ago
Alternatives and similar repositories for EdgeTransformer
Users that are interested in EdgeTransformer are comparing it to the libraries listed below
Sorting:
- Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021☆27Updated 3 years ago
- ☆39Updated 3 years ago
- ☆49Updated 4 years ago
- Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)☆12Updated last year
- [NeurIPS'20] Code for the Paper Compositional Visual Generation and Inference with Energy Based Models☆45Updated 2 years ago
- ☆51Updated 2 years ago
- Blog post☆17Updated last year
- lanmt ebm☆12Updated 5 years ago
- Implementation of ICML 22 Paper: Scaling Structured Inference with Randomization☆14Updated 2 years ago
- STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION☆16Updated 7 years ago
- Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces, NeurIPS 2021☆13Updated 3 years ago
- Code to reproduce the results for Compositional Attention☆60Updated 2 years ago
- Tensorflow implementation and notebooks for Implicit Maximum Likelihood Estimation☆67Updated 3 years ago
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19Updated 2 months ago
- ☆33Updated 4 years ago
- Self-Supervised Alignment with Mutual Information☆20Updated last year
- ☆20Updated last year
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆17Updated last year
- [ICML 2022] Latent Diffusion Energy-Based Model for Interpretable Text Modeling☆65Updated 3 years ago
- [ICML'21] Improved Contrastive Divergence Training of Energy Based Models☆63Updated 3 years ago
- Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers" (NeurIPS 2021)☆49Updated last month
- Recursive Bayesian Networks☆11Updated 2 months ago
- Companion repository to "Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models"☆13Updated 2 years ago
- This repository contains some of the code used in the paper "Training Language Models with Langauge Feedback at Scale"☆27Updated 2 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Updated 2 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆50Updated 3 years ago
- The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We s…☆67Updated 2 years ago
- Code Release for "Broken Neural Scaling Laws" (BNSL) paper☆59Updated last year
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last month
- Efficient Scaling laws and collaborative pretraining.☆16Updated 5 months ago