deep-spin / infinite-formerLinks

☆67

Alternatives and similar repositories for infinite-former

Users that are interested in infinite-former are comparing it to the libraries listed below

Sorting:

McGill-NLP / length-generalization
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆138Updated last year
booydar / LM-RMT
Recurrent Memory Transformer
☆154Updated 2 years ago
sunyt32 / torchscale
Transformers at any scale
☆42Updated last year
PiotrNawrot / dynamic-pooling
Efficient Transformers with Dynamic Token Pooling
☆65Updated 2 years ago
google-deepmind / randomized_positional_encodings
Randomized Positional Encodings Boost Length Generalization of Transformers
☆83Updated last year
Dahoas / reward-modeling
☆98Updated 2 years ago
seonghyeonye / TAPP
[AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following
☆78Updated last year
bigscience-workshop / architecture-objective
☆98Updated 2 years ago
lucidrains / memory-editable-transformer
My explorations into editing the knowledge and memories of an attention network
☆35Updated 2 years ago
yxuansu / Contrastive_Search_Is_What_You_Need
[TMLR'23] Contrastive Search Is What You Need For Neural Text Generation
☆121Updated 2 years ago
HazyResearch / prefix-linear-attention
☆57Updated last year
babylm / evaluation-pipeline-2023
Evaluation pipeline for the BabyLM Challenge 2023.
☆77Updated 2 years ago
martiansideofthemoon / rankgen
Official code and model checkpoints for our EMNLP 2022 paper "RankGen - Improving Text Generation with Large Ranking Models" (https://arx…
☆138Updated 2 years ago
facebookresearch / NPM
The original implementation of Min et al. "Nonparametric Masked Language Modeling" (paper https//arxiv.org/abs/2212.01349)
☆158Updated 2 years ago
frankxu2004 / knnlm-why
Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"
☆59Updated 2 years ago
tianjunz / HIR
☆159Updated 2 years ago
abhishekpanigrahi1996 / transformer_in_transformer
☆45Updated 2 years ago
cimeister / typical-sampling
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
☆81Updated 3 years ago
rosewang2008 / language_modeling_via_stochastic_processes
Language modeling via stochastic processes. Oral @ ICLR 2022.
☆138Updated 2 years ago
feyzaakyurek / rl4f
Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.
☆64Updated last year
kernelmachine / silo-lm
SILO Language Models code repository
☆83Updated last year
NohTow / PPL-MCTS
Repository for the code of the "PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided Decoding" paper, NAACL'22
☆66Updated 3 years ago
XuezheMax / fairseq-apollo
FairSeq repo with Apollo optimizer
☆114Updated last year
basusourya / mirostat
Code for the paper-"Mirostat: A Perplexity-Controlled Neural Text Decoding Algorithm" (https://arxiv.org/abs/2007.14966).
☆61Updated 3 years ago
SimengSun / ChapterBreak
☆11Updated last year
jzbjyb / ReAtt
Retrieval as Attention
☆82Updated 2 years ago
microsoft / AdaMix
This is the implementation of the paper AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning (https://arxiv.org/abs/2205.1…
☆136Updated 2 years ago
kernelmachine / demix
DEMix Layers for Modular Language Modeling
☆54Updated 4 years ago
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆116Updated 2 years ago
r-three / RAD
Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
☆45Updated 2 months ago