☆20Oct 25, 2022Updated 3 years ago
Alternatives and similar repositories for KERPLE
Users that are interested in KERPLE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆139Apr 30, 2024Updated 2 years ago
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆40Nov 11, 2024Updated last year
- Official implementation of the transformer (TF) architecture suggested in a paper entitled "Looped Transformers as Programmable Computers…☆39Apr 8, 2023Updated 3 years ago
- ☆11May 24, 2024Updated 2 years ago
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆34Aug 6, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆16Oct 4, 2024Updated last year
- The is the official implementation of "Lyra: Orchestrating Dual Correction in Automated Theorem Proving"☆15Jul 2, 2024Updated last year
- Efficient PScan implementation in PyTorch☆17Jan 2, 2024Updated 2 years ago
- Code for the ALiBi method for transformer language models (ICLR 2022)☆556Oct 30, 2023Updated 2 years ago
- Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure (NeurIPS 2024) + Arithmetic Transfor…☆14Oct 26, 2025Updated 7 months ago
- Algorithms for approximate attention in LLMs☆22Apr 14, 2025Updated last year
- ☆13Jun 26, 2024Updated last year
- [NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding☆22Oct 10, 2024Updated last year
- Equivalent Linear Mappings of Large Language Models☆35Nov 7, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆17Oct 31, 2023Updated 2 years ago
- Codes for the EMNLP 2023 Findings paper "Self-Polish: Enhance Reasoning in Large Language Models via Problem Refining" by Zhiheng Xi, Sen…☆32May 30, 2023Updated 3 years ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- Code for the paper "Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving"☆19May 25, 2023Updated 3 years ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Aug 23, 2023Updated 2 years ago
- Code for "Planning and Generating Natural and Diverse Disfluent Texts as Augmentation for Disfluency Detection"☆16Apr 25, 2022Updated 4 years ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32May 25, 2024Updated 2 years ago
- ☆31Mar 30, 2026Updated 2 months ago
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆18Apr 8, 2025Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆57Feb 24, 2026Updated 3 months ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆40Dec 2, 2023Updated 2 years ago
- This is the official implementation of our ICML 2024 paper "MultiMax: Sparse and Multi-Modal Attention Learning""☆22Feb 9, 2026Updated 4 months ago
- ☆39Feb 26, 2024Updated 2 years ago
- Official repository Flash Local Linear Attention☆36May 28, 2026Updated last week
- [NeurIPS 2023] Code base for the Renyi Kernel Entropy (RKE) metric for generative models.☆13Jun 18, 2025Updated 11 months ago
- Recreating the minimal training methods of DeepSeek-R1 for small langauge models.☆22Feb 10, 2025Updated last year
- Persona 5 Game Menu for Web☆17Jul 14, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆41Oct 11, 2024Updated last year
- The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Model…☆16Dec 11, 2023Updated 2 years ago
- ☆14Jul 11, 2021Updated 4 years ago
- This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box☆18Dec 19, 2024Updated last year
- Implementation of data dimensionality reduction algorithms SVD and CUR without using library functions.☆10Jul 24, 2017Updated 8 years ago
- Unsupervised Grammar Induction with Combinatory Categorial Grammars☆10Jan 28, 2021Updated 5 years ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆35Oct 3, 2024Updated last year