WangHuiNEU / Transformer_Knowlegde
从底层机理了解Transformer
☆22Updated 2 years ago
Related projects: ⓘ
- A light-weight script for maintaining a LOT of machine learning experiments.☆88Updated last year
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆95Updated last year
- The official repo of INF-34B models trained by INF Technology.☆32Updated last month
- ☆125Updated last week
- Yet another PyTorch Trainer and some core components for deep learning.☆202Updated 4 months ago
- 实现了Transformer中的几种位置编码方案☆36Updated 2 years ago
- A Tight-fisted Optimizer☆46Updated last year
- Must-read papers on improving efficiency for pre-trained language models.☆100Updated last year
- an implementation of transformer, bert, gpt, and diffusion models for learning purposes☆139Updated 10 months ago
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆94Updated last year
- The Roadmap for LLMs☆84Updated last year
- The pure and clear PyTorch Distributed Training Framework.☆276Updated 7 months ago
- Lion and Adam optimization comparison☆56Updated last year
- ☆170Updated 9 months ago
- [ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)☆111Updated 6 months ago
- A list of papers about data quality in Large Language Models (LLMs)☆18Updated 9 months ago
- FLASHQuad_pytorch☆66Updated 2 years ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆30Updated last year
- A generalized framework for subspace tuning methods in parameter efficient fine-tuning.☆70Updated this week
- Rectified Rotary Position Embeddings☆329Updated 4 months ago
- ☆139Updated 2 months ago
- Easier Configuration☆30Updated last week
- [ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408☆188Updated last year
- A paper list about diffusion models for natural language processing.☆170Updated last year
- LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment☆191Updated 4 months ago
- RoFormer V1 & V2 pytorch☆462Updated 2 years ago
- NTK scaled version of ALiBi position encoding in Transformer.☆64Updated last year
- Paper List for In-context Learning 🌷☆165Updated 6 months ago
- [ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention☆178Updated last year
- 😎 A simple and easy-to-use toolkit for GPU scheduling.☆40Updated 3 years ago