☆260Jun 6, 2025Updated 11 months ago
Alternatives and similar repositories for meliad
Users that are interested in meliad are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate …☆644Jul 17, 2023Updated 2 years ago
- Implementation of Block Recurrent Transformer - Pytorch☆225Aug 20, 2024Updated last year
- The official Languini Kitchen repository☆14May 6, 2024Updated 2 years ago
- ☆53Jan 19, 2023Updated 3 years ago
- Source-to-Source Debuggable Derivatives in Pure Python☆15Jan 23, 2024Updated 2 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Sequence modeling with Mega.☆303Jan 28, 2023Updated 3 years ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Jun 21, 2023Updated 2 years ago
- ☆13Aug 23, 2024Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- An implementation of local windowed attention for language modeling☆499Jul 16, 2025Updated 9 months ago
- playing with gpt4☆14Mar 17, 2023Updated 3 years ago
- Convenient Text-to-Text Training for Transformers☆18Dec 10, 2021Updated 4 years ago
- ☆23Oct 15, 2022Updated 3 years ago
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Jul 11, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Convolutions for Sequence Modeling☆912Jun 13, 2024Updated last year
- Understand and test language model architectures on synthetic tasks.☆265Mar 22, 2026Updated last month
- Sequence Modeling with Structured State Spaces☆67Aug 2, 2022Updated 3 years ago
- Large Context Attention☆769Oct 13, 2025Updated 6 months ago
- ☆10Dec 17, 2020Updated 5 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- ☆78Dec 7, 2023Updated 2 years ago
- Code for the paper "Query-Key Normalization for Transformers"☆52Mar 6, 2021Updated 5 years ago
- Scripts for downloading and pre-processing the `proof-pile`, a high quality dataset of mathematical text and code.☆22Nov 26, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Task-based datasets, preprocessing, and evaluation for sequence models.☆593Apr 22, 2026Updated 2 weeks ago
- Open weights language model from Google DeepMind, based on Griffin.☆673Feb 6, 2026Updated 3 months ago
- ☆19Dec 4, 2025Updated 5 months ago
- Official code for Long Expressive Memory (ICLR 2022, Spotlight)☆70Mar 11, 2022Updated 4 years ago
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Aug 12, 2023Updated 2 years ago
- ☆20May 30, 2024Updated last year
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆68Apr 24, 2024Updated 2 years ago
- Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"☆13Dec 14, 2021Updated 4 years ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆252Jun 6, 2025Updated 11 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling☆81Apr 24, 2024Updated 2 years ago
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Jul 28, 2022Updated 3 years ago
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch☆880Oct 30, 2023Updated 2 years ago
- ☆13Jun 16, 2021Updated 4 years ago
- ☆13Feb 7, 2023Updated 3 years ago
- Recurrent Memory Transformer☆158Aug 14, 2023Updated 2 years ago
- Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.☆29Feb 25, 2021Updated 5 years ago