chuanyang-Zheng / DAPELinks
The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"
☆38Updated 9 months ago
Alternatives and similar repositories for DAPE
Users that are interested in DAPE are comparing it to the libraries listed below
Sorting:
- ☆90Updated 2 months ago
- Code for paper "Patch-Level Training for Large Language Models"☆85Updated 8 months ago
- PyTorch implementation of StableMask (ICML'24)☆13Updated last year
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"☆81Updated last year
- ☆30Updated last year
- The official implementation for Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free☆44Updated 2 months ago
- Codes for Merging Large Language Models☆32Updated 11 months ago
- ☆51Updated last week
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆115Updated last week
- ☆23Updated 3 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆52Updated 2 years ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆91Updated 2 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆86Updated 9 months ago
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆31Updated 2 years ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆96Updated last week
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆75Updated last month
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆39Updated last week
- [NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding☆18Updated 9 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆40Updated last year
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆31Updated last month
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆86Updated 7 months ago
- Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024☆22Updated last year
- [ICLR 2024] This is the repository for the paper titled "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning"☆95Updated last year
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆28Updated 3 months ago
- Code for Heima☆49Updated 2 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆49Updated last month
- Mixture of Attention Heads☆47Updated 2 years ago
- Official repository for paper "DeepCritic: Deliberate Critique with Large Language Models"☆31Updated 3 weeks ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆66Updated last year
- A repository for DenseSSMs☆87Updated last year