☆20Oct 25, 2022Updated 3 years ago
Alternatives and similar repositories for KERPLE
Users that are interested in KERPLE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official implementation of the transformer (TF) architecture suggested in a paper entitled "Looped Transformers as Programmable Computers…☆38Apr 8, 2023Updated 3 years ago
- Official implementation of ECCV24 paper: POA☆24Aug 8, 2024Updated last year
- ☆38Dec 12, 2023Updated 2 years ago
- ☆13May 30, 2022Updated 3 years ago
- Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure (NeurIPS 2024) + Arithmetic Transfor…☆14Oct 26, 2025Updated 5 months ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆34Aug 6, 2023Updated 2 years ago
- The is the official implementation of "Lyra: Orchestrating Dual Correction in Automated Theorem Proving"☆15Jul 2, 2024Updated last year
- Efficient PScan implementation in PyTorch☆17Jan 2, 2024Updated 2 years ago
- Code for the ALiBi method for transformer language models (ICLR 2022)☆555Oct 30, 2023Updated 2 years ago
- Algorithms for approximate attention in LLMs☆22Apr 14, 2025Updated 11 months ago
- ☆13Jun 26, 2024Updated last year
- 🧮 Algebraic Positional Encodings.☆20Aug 20, 2025Updated 7 months ago
- Transformers at any scale☆42Jan 18, 2024Updated 2 years ago
- ☆17Oct 31, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- 📄 Evidence Retrieval and Claim Verification for the FEVER shared task using Transformer Networks☆12Feb 21, 2020Updated 6 years ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- Code for the paper "Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving"☆19May 25, 2023Updated 2 years ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Aug 23, 2023Updated 2 years ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32May 25, 2024Updated last year
- ☆26Mar 30, 2026Updated last week
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated 11 months ago
- Official code for the paper "Attention as a Hypernetwork"☆55Feb 24, 2026Updated last month
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆40Dec 2, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆36Feb 26, 2024Updated 2 years ago
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 6 months ago
- Recreating the minimal training methods of DeepSeek-R1 for small langauge models.☆22Feb 10, 2025Updated last year
- Persona 5 Game Menu for Web☆12Jul 14, 2023Updated 2 years ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆41Oct 11, 2024Updated last year
- The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Model…☆15Dec 11, 2023Updated 2 years ago
- Code for the CVPR '23 paper, "Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning"☆10Jun 9, 2023Updated 2 years ago
- Format conversion and graphical representation of [Universal Dependencies](http://universaldependencies.org) trees.☆12Sep 3, 2024Updated last year
- Stick-breaking attention☆63Jul 1, 2025Updated 9 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆14Jul 11, 2021Updated 4 years ago
- This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box☆18Dec 19, 2024Updated last year
- Dependency syntactic parser and formal grammar for Natural Languages☆12Apr 29, 2024Updated last year
- Implementation of data dimensionality reduction algorithms SVD and CUR without using library functions.☆10Jul 24, 2017Updated 8 years ago
- 轉換好的 Albert 中文模型 (for pytorch-transformers)☆19Mar 6, 2020Updated 6 years ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆35Oct 3, 2024Updated last year
- Convert CoNLL output of a dependency parser into a latex or graphviz tree☆13Mar 26, 2020Updated 6 years ago