microsoft / EfficientLongSequenceModelingLinks

☆51

Alternatives and similar repositories for EfficientLongSequenceModeling

Users that are interested in EfficientLongSequenceModeling are comparing it to the libraries listed below

Sorting:

OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆65Updated last year
sunyt32 / torchscale
Transformers at any scale
☆41Updated last year
HazyResearch / prefix-linear-attention
☆56Updated last year
Shark-NLP / CAB
☆31Updated 2 years ago
XuezheMax / fairseq-apollo
FairSeq repo with Apollo optimizer
☆114Updated last year
rosewang2008 / language_modeling_via_stochastic_processes
Language modeling via stochastic processes. Oral @ ICLR 2022.
☆138Updated 2 years ago
PiotrNawrot / dynamic-pooling
Efficient Transformers with Dynamic Token Pooling
☆64Updated 2 years ago
whyNLP / Probabilistic-Transformer
A probabilitic model for contextual word representation. Accepted to ACL2023 Findings.
☆25Updated 2 years ago
thunlp / DPT
☆13Updated 3 years ago
lsj2408 / URPE
[NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)
☆33Updated 2 years ago
RobertCsordas / ndr
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".
☆33Updated 4 months ago
CyndxAI / QKNorm
Code for the paper "Query-Key Normalization for Transformers"
☆49Updated 4 years ago
deep-spin / infinite-former
☆67Updated last year
chijames / KERPLE
☆19Updated 3 years ago
machelreid / diffuser
DiffusER: Discrete Diffusion via Edit-based Reconstruction (Reid, Hellendoorn & Neubig, 2022)
☆54Updated 2 months ago
abhishekpanigrahi1996 / transformer_in_transformer
☆45Updated 2 years ago
OpenNLPLab / Tnn
[ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling
☆80Updated last year
Noahs-ARK / RFA
☆33Updated 4 years ago
HazyResearch / skill-it
Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models
☆47Updated 2 years ago
ThomasScialom / T0_continual_learning
Adding new tasks to T0 without catastrophic forgetting
☆33Updated 3 years ago
jzbjyb / ReAtt
Retrieval as Attention
☆82Updated 2 years ago
frankxu2004 / knnlm-why
Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"
☆59Updated 2 years ago
bigscience-workshop / architecture-objective
☆98Updated 2 years ago
sustcsonglin / mamba-triton
☆48Updated last year
FranxYao / RDP
Implementation of ICML 22 Paper: Scaling Structured Inference with Randomization
☆14Updated 3 years ago
Doraemonzzz / tnn-pytorch
☆19Updated 2 years ago
SimengSun / ChapterBreak
☆11Updated last year
joeljang / ELM
[ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning
☆99Updated 2 years ago
McGill-NLP / length-generalization
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆136Updated last year
MikeWangWZHL / Zemi
Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings
☆16Updated 2 years ago