☆20Oct 25, 2022Updated 3 years ago
Alternatives and similar repositories for KERPLE
Users that are interested in KERPLE are comparing it to the libraries listed below
Sorting:
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆138Apr 30, 2024Updated last year
- [ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisi…☆14Jun 6, 2025Updated 8 months ago
- ☆13May 30, 2022Updated 3 years ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Aug 23, 2023Updated 2 years ago
- Algorithms for approximate attention in LLMs☆21Apr 14, 2025Updated 10 months ago
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated 10 months ago
- [NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding☆22Oct 10, 2024Updated last year
- Efficient PScan implementation in PyTorch☆17Jan 2, 2024Updated 2 years ago
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆40Nov 11, 2024Updated last year
- 🎮Manipulates mobile phones just like how you would. Official code for "MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficien…☆27Oct 10, 2025Updated 4 months ago
- Official implementation of ECCV24 paper: POA☆24Aug 8, 2024Updated last year
- Equivalent Linear Mappings of Large Language Models☆30Nov 7, 2025Updated 3 months ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆59Jan 5, 2026Updated last month
- ☆33Apr 22, 2025Updated 10 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- quick playground to animate pippin☆14Nov 11, 2024Updated last year
- A Text2SQL benchmark for evaluation of Large Language Models☆41Updated this week
- The original Shared Recurrent Memory Transformer implementation☆33Jul 11, 2025Updated 7 months ago
- entropix style sampling + GUI☆27Oct 30, 2024Updated last year
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32May 25, 2024Updated last year
- Official implementation of the transformer (TF) architecture suggested in a paper entitled "Looped Transformers as Programmable Computers…☆30Apr 8, 2023Updated 2 years ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆35Oct 3, 2024Updated last year
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆32Apr 9, 2025Updated 10 months ago
- ☆36Feb 26, 2024Updated 2 years ago
- [ACL'25 Findings] Official repo for "HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task"☆37Apr 7, 2025Updated 10 months ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- Code repository for "Multi-Task Encoder-Dual-Decoder Modeling Framework on Mixed Frequency Data", International Journal of Forecasting, 2…☆12Feb 18, 2024Updated 2 years ago
- A red teaming agent☆18Oct 15, 2025Updated 4 months ago
- ☆18Jun 10, 2025Updated 8 months ago
- Codes for the EMNLP 2023 Findings paper "Self-Polish: Enhance Reasoning in Large Language Models via Problem Refining" by Zhiheng Xi, Sen…☆31May 30, 2023Updated 2 years ago
- AnchorAttention: Improved attention for LLMs long-context training☆214Jan 15, 2025Updated last year
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆34Aug 6, 2023Updated 2 years ago
- Code for the ALiBi method for transformer language models (ICLR 2022)☆552Oct 30, 2023Updated 2 years ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆40Dec 2, 2023Updated 2 years ago
- ☆35Dec 12, 2023Updated 2 years ago
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆30Oct 30, 2025Updated 4 months ago
- The official implement of paper 《DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents》☆29Oct 23, 2025Updated 4 months ago
- Libraries, guides, blueprints, and sample code, to enable rapidly building 0-1 applications on iOS, Android and web.☆11May 12, 2023Updated 2 years ago