instance-wise-ordered-transformer / IOTLinks

☆20

Alternatives and similar repositories for IOT

Users that are interested in IOT are comparing it to the libraries listed below

Sorting:

zhuohan123 / hint-nart
☆10Updated 5 years ago
zomux / lanmt-ebm
lanmt ebm
☆12Updated 5 years ago
FranxYao / RDP
Implementation of ICML 22 Paper: Scaling Structured Inference with Randomization
☆14Updated 3 years ago
yzpang / gold-off-policy-text-gen-iclr21
☆50Updated 4 years ago
guolinke / fused_ops
☆10Updated 3 years ago
tencent-ailab / ICML21_OAXE
☆28Updated 4 years ago
bojone / univae
基于Transformer的单模型、多尺度的VAE模型
☆57Updated 4 years ago
microsoft / DualLearning
A dual learning toolkit developed by Microsoft Research
☆73Updated 2 years ago
rosewang2008 / language_modeling_via_stochastic_processes
Language modeling via stochastic processes. Oral @ ICLR 2022.
☆138Updated 2 years ago
mlpc-ucsd / BERT_Convolutions
(ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.
☆21Updated 3 years ago
HanGuo97 / soft-Q-learning-for-text-generation
☆70Updated 3 years ago
yxuansu / Awesome_Diffusions
☆17Updated 2 years ago
da03 / Residual-EBM
Code for Residual Energy-Based Models for Text Generation in PyTorch.
☆25Updated 4 years ago
Alab-NII / Awesome-SciLM
Pre-trained Language Model for Scientific Text
☆46Updated last year
princeton-nlp / DinkyTrain
Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃
☆114Updated 3 years ago
CyndxAI / QKNorm
Code for the paper "Query-Key Normalization for Transformers"
☆49Updated 4 years ago
felixzli / synthetic_pretraining
☆38Updated 3 years ago
yuPeiyu98 / Latent-Diffusion-EBM
[ICML 2022] Latent Diffusion Energy-Based Model for Interpretable Text Modeling
☆67Updated 3 years ago
intersun / CoDIR
Code for EMNLP 2020 paper CoDIR
☆41Updated 3 years ago
baoy-nlp / CNAT
Non-autoregressive Translation by Learning Target Categorical Codes
☆11Updated 4 years ago
XuezheMax / fairseq-apollo
FairSeq repo with Apollo optimizer
☆114Updated last year
HA-Transformer / MAT
The implementation of multi-branch attentive Transformer (MAT).
☆33Updated 5 years ago
vvvm23 / sundae
Unofficial PyTorch implementation of "Step-unrolled Denoising Autoencoders for Text Generation"
☆24Updated 3 years ago
FranxYao / Distributional-Generalization-in-Natural-Language-Processing
Distributional Generalization in NLP. A roadmap.
☆88Updated 2 years ago
microsoft / EfficientLongSequenceModeling
☆51Updated 2 years ago
TransfromerMeetsGraph / GNNLearner
Solution of KDD cup 2021
☆11Updated 4 years ago
HKUNLP / efficient-attention
[EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling
☆87Updated 2 years ago
wouterkool / stochastic-beam-search
Implementation of Stochastic Beam Search using Fairseq
☆105Updated 6 years ago
microsoft / Stochastic-Mixture-of-Experts
This package implements THOR: Transformer with Stochastic Experts.
☆65Updated 4 years ago
ylsung / vl-merging
PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"
☆37Updated 2 years ago