hazdzz / Tiger
A Tight-fisted Optimizer (Tiger), implemented in PyTorch.
☆11Updated 8 months ago
Alternatives and similar repositories for Tiger:
Users that are interested in Tiger are comparing it to the libraries listed below
- A Tight-fisted Optimizer☆47Updated 2 years ago
- Lion and Adam optimization comparison☆60Updated 2 years ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆32Updated last year
- [ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)☆119Updated last year
- ICLR2023 - Tailoring Language Generation Models under Total Variation Distance☆21Updated 2 years ago
- [ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models☆34Updated 3 months ago
- ☆14Updated last year
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆76Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆48Updated 2 years ago
- ☆101Updated last year
- An Experiment on Dynamic NTK Scaling RoPE☆62Updated last year
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆60Updated last year
- Code for preprint "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"☆36Updated 2 months ago
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆82Updated 2 years ago
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆97Updated 2 years ago
- huggingface ChineseBert Tokenizer☆15Updated 2 years ago
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆56Updated 11 months ago
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆31Updated last year
- The code of paper "Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation" published at NeurIPS 202…☆45Updated 2 years ago
- A collection of instruction data and scripts for machine translation.☆20Updated last year
- Code for paper: A Neural Span-Based Continual Named Entity Recognition Model☆16Updated last year
- Official code for ICLR 2022 paper: "PoNet: Pooling Network for Efficient Token Mixing in Long Sequences".☆31Updated last year
- ☆45Updated 9 months ago
- Converting Mixtral-8x7B to Mixtral-[1~7]x7B☆22Updated last year
- Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method ; GKD: A General Knowledge Distillation…☆32Updated last year
- Code for paper "Patch-Level Training for Large Language Models"☆81Updated 4 months ago
- Plug-and-Play Document Modules for Pre-trained Models☆25Updated last year
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆45Updated last month
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆64Updated 10 months ago