ZhuiyiTechnology / GAU-alpha
基于Gated Attention Unit的Transformer模型(尝鲜版)
☆97Updated 2 years ago
Alternatives and similar repositories for GAU-alpha:
Users that are interested in GAU-alpha are comparing it to the libraries listed below
- FLASHQuad_pytorch☆67Updated 3 years ago
- RoFormer升级版☆152Updated 2 years ago
- 实现了Transformer中的几种位置编码方案☆40Updated 3 years ago
- NTK scaled version of ALiBi position encoding in Transformer.☆67Updated last year
- [ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)☆119Updated last year
- TencentLLMEval is a comprehensive and extensive benchmark for artificial evaluation of large models that includes task trees, standards, …☆38Updated 2 weeks ago
- Lion and Adam optimization comparison☆60Updated 2 years ago
- Ladder Side-Tuning在CLUE上的简单尝试☆19Updated 2 years ago
- ☆53Updated 2 years ago
- A Tight-fisted Optimizer☆47Updated 2 years ago
- A paper list of pre-trained language models (PLMs).☆80Updated 3 years ago
- R-Drop方法在中文任务上的简单实验☆91Updated 3 years ago
- RoFormer V1 & V2 pytorch☆491Updated 2 years ago
- SuperCLUE-Math6:新一代中文原生多轮多步数学推理数据集的探索之旅☆53Updated last year
- P-tuning方法在中文上的简单实验☆139Updated 3 years ago
- Pretrain CPM-1☆51Updated 3 years ago
- ICLR2023 - Tailoring Language Generation Models under Total Variation Distance☆21Updated 2 years ago
- 中文 Instruction tuning datasets☆129Updated 11 months ago
- [ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408☆196Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆19Updated last year
- A unified tokenization tool for Images, Chinese and English.☆151Updated 2 years ago
- 真 · “Deep Learning for Humans”☆141Updated 3 years ago
- Finetune CPM-2☆82Updated 2 years ago
- Rectified Rotary Position Embeddings☆361Updated 10 months ago
- pytorch分布式训练☆64Updated last year
- 句子匹配模型,包括无监督的SimCSE、ESimCSE、PromptBERT,和有监督的SBERT、CoSENT。☆98Updated 2 years ago
- 使用 Qwen2ForSequenceClassification 简单实现文本分类任务。☆59Updated 9 months ago
- WoBERT_pytorch☆40Updated 3 years ago
- Must-read papers on improving efficiency for pre-trained language models.☆103Updated 2 years ago
- code for Scaling Laws of RoPE-based Extrapolation☆72Updated last year