hazdzz / TigerLinks

A Tight-fisted Optimizer (Tiger), implemented in PyTorch.

☆12

Alternatives and similar repositories for Tiger

Users that are interested in Tiger are comparing it to the libraries listed below

Sorting:

bojone / tiger
A Tight-fisted Optimizer
☆50Updated 2 years ago
nengwp / Lion-vs-Adam
Lion and Adam optimization comparison
☆64Updated 2 years ago
DRSY / EMO
[ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)
☆128Updated last year
ZhuiyiTechnology / GAU-alpha
基于Gated Attention Unit的Transformer模型（尝鲜版）
☆98Updated 2 years ago
lxchtan / PoNet
Official code for ICLR 2022 paper: "PoNet: Pooling Network for Efficient Token Mixing in Long Sequences".
☆33Updated 2 years ago
HKUNLP / efficient-attention
[EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling
☆87Updated 2 years ago
OpenNLPLab / Transnormer
[EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer
☆64Updated 2 years ago
berlino / gated_linear_attention
☆106Updated last year
WailordHe / DenseSSM
A repository for DenseSSMs
☆89Updated last year
bojone / univae
基于Transformer的单模型、多尺度的VAE模型
☆58Updated 4 years ago
JunnYu / ChineseBert_pytorch
huggingface ChineseBert Tokenizer
☆16Updated 3 years ago
Caiyun-AI / MUDDFormer
☆90Updated 8 months ago
RunxinXu / ContrastivePruning
Source code for our AAAI'22 paper 《From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression》
☆25Updated 4 years ago
jiahe7ay / infini-mini-transformer
This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…
☆58Updated last year
thu-coai / TaiLr
ICLR2023 - Tailoring Language Generation Models under Total Variation Distance
☆21Updated 3 years ago
wuch15 / Fastformer
A pytorch &keras implementation and demo of Fastformer.
☆192Updated 3 years ago
cofe-ai / Mu-scaling
Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales
☆32Updated 2 years ago
xyltt / Linear-Transformer
Transformer are RNNs: Fast Autoregressive Transformer with Linear Attention
☆24Updated 5 years ago
ag1988 / top_k_attention
The accompanying code for "Memory-efficient Transformers via Top-k Attention" (Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonatha…
☆70Updated 4 years ago
NormXU / Consistent-DynamicNTKRoPE
An Experiment on Dynamic NTK Scaling RoPE
☆64Updated 2 years ago
TianduoWang / DiffAug
[EMNLP 2022] Differentiable Data Augmentation for Contrastive Sentence Representation Learning. https://arxiv.org/abs/2210.16536
☆40Updated 3 years ago
Qznan / SpanKL
Code for paper: A Neural Span-Based Continual Named Entity Recognition Model
☆18Updated 2 years ago
Academic-Hammer / HammerLLM
1.4B sLLM for Chinese and English - HammerLLM🔨
☆43Updated last year
juvi21 / CoPE-cuda
Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719
☆22Updated last year
wutaiqiang / LLM_KD_AKL
☆22Updated last year
huawei-noah / Efficient-NLP
☆95Updated last year
piotrpiekos / MoSA
User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice rou…
☆28Updated 9 months ago
transformer-vq / transformer_vq
☆201Updated 2 years ago
Namco0816 / PT-BERT
ACL 2022(findings): A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings
☆18Updated 3 years ago
MGheini / xattn-transfer-for-mt
Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Tra…
☆33Updated 4 years ago