bojone / tigerLinks

A Tight-fisted Optimizer

☆48

Alternatives and similar repositories for tiger

Users that are interested in tiger are comparing it to the libraries listed below

Sorting:

nengwp / Lion-vs-Adam
Lion and Adam optimization comparison
☆61Updated 2 years ago
hazdzz / tiger
A Tight-fisted Optimizer (Tiger), implemented in PyTorch.
☆12Updated last year
ZhuiyiTechnology / GAU-alpha
基于Gated Attention Unit的Transformer模型（尝鲜版）
☆98Updated 2 years ago
thu-ml / low-bit-optimizers
Low-bit optimizers for PyTorch
☆129Updated last year
OpenNLPLab / Transnormer
[EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer
☆61Updated last year
berlino / gated_linear_attention
☆105Updated last year
YuchuanTian / DiJiang
[ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…
☆101Updated last year
HKUNLP / efficient-attention
[EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling
☆86Updated 2 years ago
YuchuanTian / RethinkTinyLM
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
☆122Updated 6 months ago
OpenNLPLab / lightning-attention
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
☆323Updated 4 months ago
jiahe7ay / infini-mini-transformer
This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…
☆58Updated last year
RUCAIBox / QuantizedEmpirical
☆14Updated last year
ziplab / QLLM
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆27Updated last year
princeton-nlp / CoFiPruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
☆196Updated 2 years ago
ModelTC / QLLM
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆39Updated last year
yuzhenmao / IceFormer
Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).
☆25Updated last year
WailordHe / DenseSSM
A repository for DenseSSMs
☆87Updated last year
NormXU / Consistent-DynamicNTKRoPE
An Experiment on Dynamic NTK Scaling RoPE
☆64Updated last year
transformer-vq / transformer_vq
☆195Updated last year
kssteven418 / LTP
[KDD'22] Learned Token Pruning for Transformers
☆98Updated 2 years ago
OpenNLPLab / TransnormerLLM
Official implementation of TransNormerLLM: A Faster and Better LLM
☆247Updated last year
kyegomez / AttentionIsOFFByOne
Implementation of "Attention Is Off By One" by Evan Miller
☆193Updated last year
mdy666 / Qwen-Native-Sparse-Attention
qwen-nsa
☆68Updated 3 months ago
pprp / Pruner-Zero
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆89Updated 7 months ago
CASIA-IVA-Lab / FLAP
[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models
☆56Updated last year
L1aoXingyu / llm-infer-bench
☆11Updated last year
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆48Updated 2 years ago
cofe-ai / Mu-scaling
Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales
☆32Updated 2 years ago
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆52Updated 2 years ago
htqin / IR-QLoRA
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…
☆65Updated last year