☆259Jun 6, 2025Updated 9 months ago
Alternatives and similar repositories for meliad
Users that are interested in meliad are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate …☆641Jul 17, 2023Updated 2 years ago
- Implementation of Block Recurrent Transformer - Pytorch☆224Aug 20, 2024Updated last year
- The official Languini Kitchen repository☆14May 6, 2024Updated last year
- ☆53Jan 19, 2023Updated 3 years ago
- Source-to-Source Debuggable Derivatives in Pure Python☆15Jan 23, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Sequence modeling with Mega.☆303Jan 28, 2023Updated 3 years ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Jun 21, 2023Updated 2 years ago
- ☆13Aug 23, 2024Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- An implementation of local windowed attention for language modeling☆498Jul 16, 2025Updated 8 months ago
- playing with gpt4☆14Mar 17, 2023Updated 3 years ago
- Convenient Text-to-Text Training for Transformers☆19Dec 10, 2021Updated 4 years ago
- ☆23Oct 15, 2022Updated 3 years ago
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Jul 11, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Understand and test language model architectures on synthetic tasks.☆263Updated this week
- Convolutions for Sequence Modeling☆911Jun 13, 2024Updated last year
- Sequence Modeling with Structured State Spaces☆67Aug 2, 2022Updated 3 years ago
- Large Context Attention☆769Oct 13, 2025Updated 5 months ago
- ☆10Dec 17, 2020Updated 5 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- ☆78Dec 7, 2023Updated 2 years ago
- Code for the paper "Query-Key Normalization for Transformers"☆52Mar 6, 2021Updated 5 years ago
- Scripts for downloading and pre-processing the `proof-pile`, a high quality dataset of mathematical text and code.☆22Nov 26, 2022Updated 3 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Task-based datasets, preprocessing, and evaluation for sequence models.☆594Mar 9, 2026Updated 2 weeks ago
- Open weights language model from Google DeepMind, based on Griffin.☆665Feb 6, 2026Updated last month
- ☆19Dec 4, 2025Updated 3 months ago
- Official code for Long Expressive Memory (ICLR 2022, Spotlight)☆70Mar 11, 2022Updated 4 years ago
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Aug 12, 2023Updated 2 years ago
- ☆20May 30, 2024Updated last year
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆68Apr 24, 2024Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆250Jun 6, 2025Updated 9 months ago
- Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"☆13Dec 14, 2021Updated 4 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling☆81Apr 24, 2024Updated last year
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Jul 28, 2022Updated 3 years ago
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch☆879Oct 30, 2023Updated 2 years ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- ☆13Jun 16, 2021Updated 4 years ago
- ☆13Feb 7, 2023Updated 3 years ago
- Recurrent Memory Transformer☆156Aug 14, 2023Updated 2 years ago