hazdzz / tigerLinks

A Tight-fisted Optimizer (Tiger), implemented in PyTorch.

☆12

Alternatives and similar repositories for tiger

Users that are interested in tiger are comparing it to the libraries listed below

Sorting:

bojone / tiger
A Tight-fisted Optimizer
☆50Updated 2 years ago
ZhuiyiTechnology / GAU-alpha
基于Gated Attention Unit的Transformer模型（尝鲜版）
☆98Updated 2 years ago
nengwp / Lion-vs-Adam
Lion and Adam optimization comparison
☆64Updated 2 years ago
DRSY / EMO
[ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)
☆126Updated last year
OpenNLPLab / Transnormer
[EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer
☆63Updated 2 years ago
JunnYu / ChineseBert_pytorch
huggingface ChineseBert Tokenizer
☆16Updated 3 years ago
jiahe7ay / infini-mini-transformer
This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…
☆58Updated last year
TianduoWang / DiffAug
[EMNLP 2022] Differentiable Data Augmentation for Contrastive Sentence Representation Learning. https://arxiv.org/abs/2210.16536
☆40Updated 3 years ago
Caiyun-AI / MUDDFormer
☆86Updated 6 months ago
JunnYu / FLASHQuad_pytorch
FLASHQuad_pytorch
☆68Updated 3 years ago
lxchtan / PoNet
Official code for ICLR 2022 paper: "PoNet: Pooling Network for Efficient Token Mixing in Long Sequences".
☆33Updated 2 years ago
Namco0816 / PT-BERT
ACL 2022(findings): A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings
☆18Updated 3 years ago
WailordHe / DenseSSM
A repository for DenseSSMs
☆89Updated last year
yikangshen / MoA
Mixture of Attention Heads
☆51Updated 3 years ago
bojone / univae
基于Transformer的单模型、多尺度的VAE模型
☆57Updated 4 years ago
zhaominyiz / EPiDA
Official Code for 'EPiDA: An Easy Plug-in Data Augmentation Framework for High Performance Text Classification' - NAACL 2022
☆23Updated 3 years ago
WhereIsAI / BiLLM
Tool for converting LLMs from uni-directional to bi-directional by removing causal mask for tasks like classification and sentence embedd…
☆62Updated 11 months ago
berlino / gated_linear_attention
☆105Updated last year
thu-coai / TaiLr
ICLR2023 - Tailoring Language Generation Models under Total Variation Distance
☆21Updated 2 years ago
HKUNLP / efficient-attention
[EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling
☆87Updated 2 years ago
JunnYu / GAU-alpha-pytorch
GAU-alpha-pytorch
☆20Updated 3 years ago
Dousia / MetricPrompt
Code for KDD 2023 long paper: MetricPrompt: Prompting Model as a Relevance Metric for Few-Shot Text Classification
☆19Updated last year
wutaiqiang / LLM_KD_AKL
☆20Updated last year
NormXU / Consistent-DynamicNTKRoPE
An Experiment on Dynamic NTK Scaling RoPE
☆64Updated last year
cofe-ai / Mu-scaling
Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales
☆32Updated 2 years ago
Doraemonzzz / Transformer-Evolution-Paper
记录Transformer升级的论文笔记
☆19Updated 2 years ago
piotrpiekos / MoSA
User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice rou…
☆28Updated 6 months ago
JungHoyoun / PromptCompressor
☆12Updated last year
twinkle0331 / Xcompression
[ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)
☆22Updated 2 years ago
SimiaoZuo / MoEBERT
This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).
☆112Updated 3 years ago