Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper
☆82Oct 30, 2021Updated 4 years ago
Alternatives and similar repositories for ponder-transformer
Users that are interested in ponder-transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆49Jan 27, 2022Updated 4 years ago
- An implementation of (Induced) Set Attention Block, from the Set Transformers paper☆67Jan 10, 2023Updated 3 years ago
- Implementation of a Transformer, but completely in Triton☆279Apr 5, 2022Updated 3 years ago
- Visual search interface☆11Nov 30, 2021Updated 4 years ago
- Implementation of Multistream Transformers in Pytorch☆54Jul 31, 2021Updated 4 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A simple Transformer where the softmax has been replaced with normalization☆20Sep 11, 2020Updated 5 years ago
- A GPT, made only of MLPs, in Jax☆59Jun 23, 2021Updated 4 years ago
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group☆37Sep 23, 2024Updated last year
- slowly building a set of infinite riddle generators for data-hungry methods☆14Nov 15, 2022Updated 3 years ago
- Code for reproducing the results from the paper Avoiding Side Effects in Complex Environments☆12Jun 3, 2021Updated 4 years ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Jan 21, 2025Updated last year
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆41Dec 21, 2025Updated 3 months ago
- ☆21Mar 15, 2023Updated 3 years ago
- Explorations into NEAT and some of its derivative research☆33Dec 14, 2025Updated 3 months ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- ☆11Jun 2, 2021Updated 4 years ago
- zero-vocab or low-vocab embeddings☆18Jul 17, 2022Updated 3 years ago
- Implementation of Tranception, an attention network, paired with retrieval, that is SOTA for protein fitness prediction☆32Jun 19, 2022Updated 3 years ago
- A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering☆43Nov 8, 2020Updated 5 years ago
- Usable implementation of Mogrifier, a circuit for enhancing LSTMs and potentially other networks, from Deepmind☆22Jun 9, 2024Updated last year
- A PyTorch wrapper of parallel exclusive scan in CUDA☆12May 25, 2023Updated 2 years ago
- A machine learning library capable of training various deep neural networks (RNNs, LSTMs, DBNs, ect...) on a GPU. It makes use of auto-di…☆10Aug 28, 2018Updated 7 years ago
- Implementation of Universal Transformer in Pytorch☆267Nov 19, 2018Updated 7 years ago
- Implementation of Fast Transformer in Pytorch☆176Aug 26, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Implementation of Soft Actor Critic and some of its improvements in Pytorch☆63Dec 29, 2025Updated 2 months ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- Implementation of a holodeck, written in Pytorch☆18Nov 1, 2023Updated 2 years ago
- Neural Turing Machines in pytorch☆48Dec 30, 2021Updated 4 years ago
- DALLE-tools provided useful dataset utilities to improve you workflow with WebDatasets.☆14Mar 9, 2022Updated 4 years ago
- Very deep VAEs in JAX/Flax☆46Jun 16, 2021Updated 4 years ago
- Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch☆253Sep 1, 2022Updated 3 years ago
- MXNet/Gluon implement of L-GM-Loss☆11Oct 17, 2018Updated 7 years ago
- Axial Positional Embedding for Pytorch☆84Feb 25, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch☆46Mar 3, 2021Updated 5 years ago
- Code to reproduce the results for Compositional Attention☆59Nov 16, 2022Updated 3 years ago
- My own attempt at a long context genomics model, leveraging recent advances in long context attention modeling (Flash Attention + other h…☆54Jul 2, 2023Updated 2 years ago
- An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates concepts from neural fields, top-down-bottom-up proc…☆196Mar 27, 2021Updated 5 years ago
- ☆27Mar 13, 2021Updated 5 years ago
- My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation☆90Oct 11, 2024Updated last year
- A python library for highly configurable transformers - easing model architecture search and experimentation.☆48Nov 30, 2021Updated 4 years ago