Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
☆877Oct 30, 2023Updated 2 years ago
Alternatives and similar repositories for RETRO-pytorch
Users that are interested in RETRO-pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An experimental implementation of the retrieval-enhanced language model☆74Dec 29, 2022Updated 3 years ago
- Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate …☆645Jul 17, 2023Updated 2 years ago
- Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)☆189Jun 24, 2022Updated 3 years ago
- Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways☆824Nov 9, 2022Updated 3 years ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Code repository for supporting the paper "Atlas Few-shot Learning with Retrieval Augmented Language Models",(https//arxiv.org/abs/2208.03…☆561Apr 8, 2026Updated 2 months ago
- A concise but complete implementation of CLIP with various experimental improvements from recent papers☆723Oct 16, 2023Updated 2 years ago
- A concise but complete full-attention transformer with a set of promising experimental features from various papers☆5,893Jun 8, 2026Updated last week
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,749Jan 8, 2024Updated 2 years ago
- OSLO: Open Source framework for Large-scale model Optimization☆309Aug 25, 2022Updated 3 years ago
- Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch☆81Dec 4, 2022Updated 3 years ago
- Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning☆779Apr 7, 2023Updated 3 years ago
- Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch☆1,267Oct 18, 2022Updated 3 years ago
- ☆331Jun 7, 2021Updated 5 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Another attempt at a long-context / efficient transformer by me☆38Apr 11, 2022Updated 4 years ago
- One stop shop for all things carp☆58Sep 9, 2022Updated 3 years ago
- Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM☆7,863May 29, 2026Updated 3 weeks ago
- Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning☆167Feb 12, 2024Updated 2 years ago
- Automatically create Faiss knn indices with the most optimal similarity search parameters.☆904Nov 4, 2025Updated 7 months ago
- A GPT, made only of MLPs, in Jax☆59Jun 23, 2021Updated 4 years ago
- ☆2,969Updated this week
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆49Jan 27, 2022Updated 4 years ago
- Search Engines with Autoregressive Language models☆295Apr 4, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A modular RL library to fine-tune language models to human preferences☆2,388Mar 1, 2024Updated 2 years ago
- Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"☆457Sep 6, 2023Updated 2 years ago
- An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries☆7,442Jun 11, 2026Updated last week
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Feb 13, 2023Updated 3 years ago
- Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch☆549Jan 17, 2023Updated 3 years ago
- Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.☆2,091Updated this week
- Fusion-in-Decoder☆595Oct 4, 2023Updated 2 years ago
- Official repo to On the Generalization Ability of Retrieval-Enhanced Transformers☆47Jun 4, 2024Updated 2 years ago
- Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch☆230Sep 6, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT☆228Mar 25, 2026Updated 2 months ago
- Reformer, the efficient Transformer, in Pytorch☆2,193Jun 21, 2023Updated 2 years ago
- My explorations into editing the knowledge and memories of an attention network☆35Dec 8, 2022Updated 3 years ago
- Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)☆75Jan 20, 2022Updated 4 years ago
- Implementation of Tranception, an attention network, paired with retrieval, that is SOTA for protein fitness prediction☆32Jun 19, 2022Updated 4 years ago
- Repo for external large-scale work☆6,545Apr 27, 2024Updated 2 years ago
- Easily compute clip embeddings and build a clip retrieval system with them☆2,774Mar 28, 2026Updated 2 months ago