☆260Jun 6, 2025Updated 11 months ago
Alternatives and similar repositories for meliad
Users that are interested in meliad are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate …☆644Jul 17, 2023Updated 2 years ago
- Implementation of Block Recurrent Transformer - Pytorch☆226Aug 20, 2024Updated last year
- The official Languini Kitchen repository☆14May 6, 2024Updated 2 years ago
- ☆54Jan 19, 2023Updated 3 years ago
- Source-to-Source Debuggable Derivatives in Pure Python☆15Jan 23, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Sequence modeling with Mega.☆303Jan 28, 2023Updated 3 years ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Jun 21, 2023Updated 2 years ago
- ☆13Aug 23, 2024Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- An implementation of local windowed attention for language modeling☆499Jul 16, 2025Updated 10 months ago
- playing with gpt4☆13Mar 17, 2023Updated 3 years ago
- Convenient Text-to-Text Training for Transformers☆18Dec 10, 2021Updated 4 years ago
- ☆23Oct 15, 2022Updated 3 years ago
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Jul 11, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Convolutions for Sequence Modeling☆912Jun 13, 2024Updated last year
- Understand and test language model architectures on synthetic tasks.☆268Mar 22, 2026Updated 2 months ago
- Sequence Modeling with Structured State Spaces☆67Aug 2, 2022Updated 3 years ago
- Large Context Attention☆772Oct 13, 2025Updated 7 months ago
- ☆10Dec 17, 2020Updated 5 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- ☆78Dec 7, 2023Updated 2 years ago
- Scripts for downloading and pre-processing the `proof-pile`, a high quality dataset of mathematical text and code.☆22Nov 26, 2022Updated 3 years ago
- Task-based datasets, preprocessing, and evaluation for sequence models.☆593May 12, 2026Updated 2 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆19Dec 4, 2025Updated 5 months ago
- Open weights language model from Google DeepMind, based on Griffin.☆677Feb 6, 2026Updated 3 months ago
- Official code for Long Expressive Memory (ICLR 2022, Spotlight)☆71Mar 11, 2022Updated 4 years ago
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Aug 12, 2023Updated 2 years ago
- Code for the paper "Query-Key Normalization for Transformers"☆53Mar 6, 2021Updated 5 years ago
- ☆20May 30, 2024Updated last year
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆69Apr 24, 2024Updated 2 years ago
- Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"☆13Dec 14, 2021Updated 4 years ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆254Jun 6, 2025Updated 11 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling☆81Apr 24, 2024Updated 2 years ago
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Jul 28, 2022Updated 3 years ago
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch☆878Oct 30, 2023Updated 2 years ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- ☆13Jun 16, 2021Updated 4 years ago
- ☆13Feb 7, 2023Updated 3 years ago
- Recurrent Memory Transformer☆158Aug 14, 2023Updated 2 years ago