jungokasai / T2RLinks

☆14

Alternatives and similar repositories for T2R

Users that are interested in T2R are comparing it to the libraries listed below

Sorting:

NonvolatileMemory / flash_attn_gqa
triton ver of gqa flash attn, based on the tutorial
☆12Updated last year
sustcsonglin / disco-pointer
Official Implementation of ACL2023: Don't Parse, Choose Spans! Continuous and Discontinuous Constituency Parsing via Autoregressive Span …
☆13Updated last year
yikangshen / megablocks
☆20Updated last year
whyNLP / Probabilistic-Transformer
A probabilitic model for contextual word representation. Accepted to ACL2023 Findings.
☆24Updated last year
Shark-NLP / CAB
☆31Updated 2 years ago
RUCAIBox / ELMER
This repository is the official implementation of our EMNLP 2022 paper ELMER: A Non-Autoregressive Pre-trained Language Model for Efficie…
☆26Updated 2 years ago
lyutyuh / structured-span-selector
A Structured Span Selector (NAACL 2022). A structured span selector with a WCFG for span selection tasks (coreference resolution, semanti…
☆21Updated 3 years ago
da03 / criticize_text_generation
A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …
☆11Updated 2 years ago
VITA-Group / Data-Efficient-Scaling
[ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang
☆14Updated last year
allenai / staged-training
Staged Training for Transformer Language Models
☆32Updated 3 years ago
renll / SparseLT
[EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing
☆14Updated 2 years ago
emorynlp / seq2seq-corenlp
☆13Updated 2 years ago
HazyResearch / prefix-linear-attention
☆55Updated last year
chijames / KERPLE
☆19Updated 2 years ago
juvi21 / CoPE-cuda
Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719
☆22Updated last year
jenni-ai / T2FW
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆19Updated 2 years ago
Noahs-ARK / PaLM
PyTorch implementation for PaLM: A Hybrid Parser and Language Model.
☆10Updated 5 years ago
OpenLMLab / ParallelTokenizer
Use the tokenizer in parallel to achieve superior acceleration
☆17Updated last year
renll / SeqBoat
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆38Updated last year
kyegomez / Reka-Torch
Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch
☆30Updated 2 weeks ago
yafuly / SyntacticGen
☆15Updated 2 years ago
sunyt32 / torchscale
Transformers at any scale
☆41Updated last year
gmftbyGMFTBY / Rep-Dropout
[NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
☆33Updated last year
twinkle0331 / Xcompression
[ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)
☆22Updated 2 years ago
Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts
☆22Updated last year
Shwai-He / MEO
The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":
☆38Updated last year
microsoft / Stochastic-Mixture-of-Experts
This package implements THOR: Transformer with Stochastic Experts.
☆65Updated 3 years ago
qiuzh20 / gated_attention
The official implementation for Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
☆46Updated 2 months ago
princeton-pli / PruLong
Code for the preprint "Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?"
☆41Updated last week
Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆11Updated last month