Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models."
☆59Sep 25, 2025Updated 9 months ago
Alternatives and similar repositories for mrt5
Users that are interested in mrt5 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- PathPiece tokenizer☆14Nov 10, 2024Updated last year
- Digital texts in Prakrit☆11Sep 14, 2025Updated 9 months ago
- Linear Attention for Efficient Bidirectional Sequence Modeling☆16May 13, 2025Updated last year
- MARs: Multi-view Attention Regularizations for Patch-based Feature Recognition of Space Terrain☆11Nov 20, 2024Updated last year
- ☆11Nov 18, 2024Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- [NeurIPS 2024] Image Understanding Makes for A Good Tokenizer for Image Generation☆22Dec 17, 2024Updated last year
- Official repository of the paper "JIST: Joint Image and Sequence Training for Sequential Visual Place Recognition"☆24Dec 15, 2023Updated 2 years ago
- ☆18Jun 12, 2023Updated 3 years ago
- Tool to perform paired evaluation of automatic systems☆13Oct 20, 2021Updated 4 years ago
- This repository contains all code and data for the Inside Out Visual Place Recognition task☆23Nov 24, 2021Updated 4 years ago
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆92Sep 12, 2025Updated 9 months ago
- The offcial repository for 'CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos', SIGI…☆16May 4, 2022Updated 4 years ago
- ☆25Oct 13, 2024Updated last year
- [CVPR'25] MergeVQ: A Unified Framework for Visual Generation and Representation with Token Merging and Quantization☆51Jul 22, 2025Updated 11 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Large language model Mistral for DNA☆24Sep 12, 2025Updated 9 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆39Jun 11, 2025Updated last year
- triple-encoders is a library for contextualizing distributed Sentence Transformers representations.☆15Sep 3, 2024Updated last year
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆225Sep 18, 2025Updated 9 months ago
- SCT: An Efficient Self-Supervised Cross-View Training For Sentence Embedding (TACL)☆16Jul 27, 2024Updated last year
- Statewide Visual Geolocalization in the Wild (ECCV 2024)☆75Dec 2, 2024Updated last year
- Experiments for efforts to train a new and improved t5☆76Apr 15, 2024Updated 2 years ago
- Visual Place Recognition☆31Nov 25, 2025Updated 7 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆29Apr 17, 2024Updated 2 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- The real GPT-4 with image access (You probably don't have access)☆12Mar 17, 2023Updated 3 years ago
- Official code for "To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition" CVPR IMW 2025☆38Oct 4, 2025Updated 9 months ago
- Convert Transkribus PAGE-XML to standard PAGE-XML☆12Dec 10, 2025Updated 6 months ago
- Named entity annotation tool☆28Jul 6, 2023Updated 2 years ago
- "Graph Convolutions Enrich the Self-Attention in Transformers!" NeurIPS 2024☆27Mar 19, 2025Updated last year
- (ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.☆21Jul 13, 2022Updated 3 years ago
- NanoGPT (124M) quality in 2.67B tokens☆28Sep 17, 2025Updated 9 months ago
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- ResiDual: Transformer with Dual Residual Connections, https://arxiv.org/abs/2304.14802☆98Aug 18, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆101Jul 4, 2025Updated last year
- Code for the paper "On the Expressivity Role of LayerNorm in Transformers' Attention" (Findings of ACL'2023)☆60Sep 27, 2024Updated last year
- ☆14Aug 12, 2022Updated 3 years ago
- Dataset and Baselines for "You are here! Finding position and orientation on a 2D map from a single image: The Flatlandia localization pr…☆11Sep 15, 2023Updated 2 years ago
- [ICLR 2025 Oral] "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆92Oct 15, 2024Updated last year
- minimalistic AI library that resembles HF's transformers☆13Dec 31, 2024Updated last year
- 🚀🤗 A collection of templates for Hugging Face Spaces☆34Oct 9, 2023Updated 2 years ago