Official code for the NeurIPS25 paper "RAT: Bridging RNN Efficiencyand Attention Accuracy in Language Modeling" (https://arxiv.org/abs/2507.04416))
☆26Dec 10, 2025Updated 6 months ago
Alternatives and similar repositories for RAT
Users that are interested in RAT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok …☆30Dec 8, 2025Updated 6 months ago
- The official code of "Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers"☆20Jul 24, 2024Updated last year
- mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models☆11Jan 19, 2024Updated 2 years ago
- ☆21Dec 5, 2022Updated 3 years ago
- Codebase to fully reproduce the results of "No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO" (M…☆32Nov 20, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆13Feb 7, 2023Updated 3 years ago
- ☆16May 14, 2024Updated 2 years ago
- Implementation of Cascaded Head-colliding Attention (ACL'2021)☆11Sep 16, 2021Updated 4 years ago
- Implementation and experiments for Partially Supervised NER via Expected Entity Ratio in TACL 2022☆14Nov 7, 2022Updated 3 years ago
- Sparse Autoencoders for Stable Diffusion XL models.☆89Oct 30, 2025Updated 7 months ago
- ☆16Mar 22, 2023Updated 3 years ago
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Sep 4, 2025Updated 9 months ago
- Code and data for the paper "Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?"☆26Jun 3, 2025Updated last year
- ☆17Oct 27, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- code for "EMS: 3D Eyebrow Modeling from Single-view Images"(SIGGRAPH Asia 2023)☆14May 3, 2025Updated last year
- Pytorch implementation of Graph-to-Graph Transformer for Transition-based Dependency Parsing accepted to EMNLP 2020☆22Nov 28, 2022Updated 3 years ago
- UNLP 2025 Shared Task on Detecting Social Media Manipulation☆23Aug 4, 2025Updated 10 months ago
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 3 years ago
- ☆25May 25, 2024Updated 2 years ago
- CSCS User Lab Day – Meet the Swiss National Supercomputing Centre☆13Sep 12, 2025Updated 9 months ago
- A probabilitic model for contextual word representation. Accepted to ACL2023 Findings.☆25Oct 22, 2023Updated 2 years ago
- UDapter is a multilingual dependency parser that uses "contextual" adapters together with language-typology features for language-specifi…☆31Dec 5, 2022Updated 3 years ago
- ☆22Apr 13, 2018Updated 8 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- The PyTorch implementation of paper "KERMIT: Knowledge Graph Completion of Enhanced Relation Modeling with Inverse Transformation"☆16Jul 4, 2025Updated 11 months ago
- TOD-Flow: Modeling the Structure of Task-Oriented Dialogues☆13Feb 7, 2024Updated 2 years ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆43Dec 29, 2025Updated 6 months ago
- zero shot NER fine tuning☆14Mar 17, 2025Updated last year
- Block-Recurrent Dynamics in ViTs 🦖☆46May 21, 2026Updated last month
- ☆125Feb 19, 2026Updated 4 months ago
- A curated collection of state-of-the-art Image Denoising research, tools, and datasets. 🌟 Star if you like it! 🌟☆12Jun 14, 2026Updated 2 weeks ago
- ☆35Feb 24, 2026Updated 4 months ago
- FlexiTokens☆23Dec 27, 2025Updated 6 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Official repository Flash Local Linear Attention☆37May 28, 2026Updated last month
- [ICLR 2025] SDTT: a simple and effective distillation method for discrete diffusion models☆51Feb 26, 2026Updated 4 months ago
- Tutorial to start working with Multiple Instance Learning☆16Jul 5, 2023Updated 2 years ago
- Code for "BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition"☆32Jun 20, 2023Updated 3 years ago
- ☆14Mar 22, 2024Updated 2 years ago
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆35Aug 6, 2023Updated 2 years ago
- Tower Parse: Low-Resource Dependency Parsing via Hierarchical Source Selection☆15Aug 20, 2021Updated 4 years ago