OpenNLPLab / TransnormerLLM
Official implementation of TransNormerLLM: A Faster and Better LLM
☆229Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for TransnormerLLM
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆184Updated 6 months ago
- Rectified Rotary Position Embeddings☆341Updated 6 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆118Updated 4 months ago
- ☆154Updated last month
- Implementation of "Attention Is Off By One" by Evan Miller☆183Updated last year
- ☆199Updated 5 months ago
- ☆247Updated last year
- PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models(NeurIPS 2024 Spotlight)☆265Updated this week
- Official PyTorch implementation of QA-LoRA☆117Updated 8 months ago
- AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning (ICLR 2023).☆275Updated last year
- Low-bit optimizers for PyTorch☆119Updated last year
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆358Updated last month
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆558Updated 8 months ago
- ☆181Updated 11 months ago
- A repository sharing the literatures about long-context large language models, including the methodologies and the evaluation benchmarks☆252Updated 3 months ago
- [ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…☆98Updated 5 months ago
- ☆134Updated last year
- Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning☆384Updated 6 months ago
- (Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …☆133Updated 6 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆135Updated last month
- PB-LLM: Partially Binarized Large Language Models☆148Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆129Updated 2 months ago
- DSIR large-scale data selection framework for language model training☆230Updated 7 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆126Updated 5 months ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆49Updated last year
- Scaling Data-Constrained Language Models☆321Updated last month
- Recurrent Memory Transformer☆149Updated last year
- Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates☆435Updated 6 months ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆293Updated 5 months ago
- LongQLoRA: Extent Context Length of LLMs Efficiently☆159Updated last year