Code for the paper Don't Pay Attention
☆56Sep 25, 2025Updated 6 months ago
Alternatives and similar repositories for avey-dpa
Users that are interested in avey-dpa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Parallel Associative Scan for Language Models☆18Jan 8, 2024Updated 2 years ago
- Semantic alignment of astronomical data with natural language using multi-modal models. (Jax) Code associated with https://arxiv.org/abs/…☆17Oct 18, 2024Updated last year
- AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning (Published in TMLR)☆23Oct 15, 2024Updated last year
- Combining SOAP and MUON☆19Feb 11, 2025Updated last year
- Implementation of Hyena Hierarchy in JAX☆10Apr 30, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆45Apr 30, 2018Updated 7 years ago
- ☆19Dec 4, 2025Updated 4 months ago
- Fluid Language Model Benchmarking☆27Sep 16, 2025Updated 6 months ago
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Nov 11, 2024Updated last year
- Ultra-minimal autoregressive diffusion model for image generation☆21Dec 26, 2025Updated 3 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Dec 29, 2025Updated 3 months ago
- Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)☆98Dec 5, 2024Updated last year
- ☆20Apr 17, 2023Updated 2 years ago
- [ICML 2024] SIRFShampoo: Structured inverse- and root-free Shampoo in PyTorch (https://arxiv.org/abs/2402.03496)☆15Nov 4, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Stochastic trace estimation using JAX☆17Aug 20, 2025Updated 7 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆325Mar 31, 2026Updated last week
- Official code for ICLR 2023 paper "ContraNorm: A Contrastive Learning Perspective on Oversmoothing and Beyond "☆35Apr 24, 2023Updated 2 years ago
- ☆29Jul 9, 2024Updated last year
- ☆22Nov 9, 2024Updated last year
- Flow-Modulated Scoring for Semantic-Aware Knowledge Graph Completion.☆18Mar 25, 2026Updated 2 weeks ago
- This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.☆35Oct 28, 2025Updated 5 months ago
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆15Apr 30, 2025Updated 11 months ago
- Fast semantic search for biorXiv manuscripts☆12Feb 16, 2025Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆11Oct 25, 2020Updated 5 years ago
- manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices☆11Jan 12, 2021Updated 5 years ago
- Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence☆60Nov 11, 2025Updated 4 months ago
- Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics☆72Mar 26, 2026Updated 2 weeks ago
- This github contains the implementation of the method proposed in MDGNN_BS paper☆12May 9, 2024Updated last year
- The accompanying code for "Simplifying and Understanding State Space Models with Diagonal Linear RNNs" (Ankit Gupta, Harsh Mehta, Jonatha…☆23Dec 30, 2022Updated 3 years ago
- qwen3 experiments☆34Jul 1, 2025Updated 9 months ago
- A language server for bibfile citations☆16Jan 8, 2026Updated 3 months ago
- PyTorch Implementation of Context-Aware Sequential Model for Multi-Behaviour Recommendation https://arxiv.org/abs/2312.09684☆10May 31, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Research and experimental code related to Opacus, an open-source library for training PyTorch models with Differential Privacy☆18Oct 9, 2024Updated last year
- [Accepted by TNNLS] Source Code for Relational Redundancy-Free Graph Clustering☆13Sep 24, 2023Updated 2 years ago
- ☆36Feb 26, 2024Updated 2 years ago
- Graph in Graph Neural Network (https://arxiv.org/abs/2407.00696)☆15Sep 12, 2024Updated last year
- StyleGAN2 - Official TensorFlow Implementation with practical improvements☆11Apr 17, 2020Updated 5 years ago
- [ACL 2024] Predicting the Unpredictable: Uncertainty-Aware Reasoning over Temporal Knowledge Graphs via Diffusion Process☆18Oct 7, 2024Updated last year
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆68Apr 24, 2024Updated last year