YuchuanTian / DiJiang
[ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear attention mechanism.
☆98Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for DiJiang
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆116Updated 4 months ago
- ☆97Updated 8 months ago
- Low-bit optimizers for PyTorch☆118Updated last year
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆184Updated 6 months ago
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆54Updated last year
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆91Updated last month
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆58Updated 6 months ago
- ☆133Updated last year
- Official implementation of TransNormerLLM: A Faster and Better LLM☆229Updated 9 months ago
- ☆95Updated last month
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆122Updated 6 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆28Updated 7 months ago
- Linear Attention Sequence Parallelism (LASP)☆64Updated 5 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆111Updated 2 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆133Updated last month
- Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆169Updated this week
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆42Updated last year
- The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆92Updated last month
- ☆62Updated last month
- ☆170Updated last month
- ☆149Updated 2 weeks ago
- Triton-based implementation of Sparse Mixture of Experts.☆184Updated 3 weeks ago
- A repository for DenseSSMs☆88Updated 6 months ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆49Updated 2 weeks ago
- Some preliminary explorations of Mamba's context scaling.☆190Updated 9 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆72Updated 7 months ago
- An algorithm for static activation quantization of LLMs☆67Updated last month
- Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆71Updated 4 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆67Updated last month
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆133Updated last month