Official implementation of the transformer (TF) architecture suggested in a paper entitled "Looped Transformers as Programmable Computers"
☆30Apr 8, 2023Updated 2 years ago
Alternatives and similar repositories for Looped-Transformer
Users that are interested in Looped-Transformer are comparing it to the libraries listed below
Sorting:
- ☆35Dec 12, 2023Updated 2 years ago
- ☆20Mar 1, 2023Updated 3 years ago
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Aug 12, 2023Updated 2 years ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year
- ☆10Oct 28, 2024Updated last year
- Created Francisco Angulo de Lafuente ⚡️Deploy the DEMO⬇️☆20Jan 1, 2025Updated last year
- Sampling-Based Minimum Bayes-Risk Decoding for Neural Machine Translation☆16Oct 14, 2022Updated 3 years ago
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- Representation Learning in RL☆13Jun 1, 2022Updated 3 years ago
- Official repository of "Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models" [ICML 2023]☆23Jan 10, 2025Updated last year
- Code for the paper: https://arxiv.org/pdf/2309.06979.pdf☆21Jul 29, 2024Updated last year
- ☆18Jul 10, 2022Updated 3 years ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆57Mar 10, 2025Updated 11 months ago
- (NeurIPS '22) LISA: Learning Interpretable Skill Abstractions - A framework for unsupervised skill learning using Imitation☆29Feb 22, 2023Updated 3 years ago
- [CoLM 24] Official Repository of MambaByte: Token-free Selective State Space Model☆24Oct 12, 2024Updated last year
- Generative Equilibrium Transformer☆27Nov 11, 2023Updated 2 years ago
- Implementation of ICML 2023 paper: Future-conditioned Unsupervised Pretraining for Decision Transformer☆29Jul 25, 2023Updated 2 years ago
- Benchmark API for Multidomain Language Modeling☆25Aug 26, 2022Updated 3 years ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Aug 25, 2023Updated 2 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- Code and data for "A Systematic Assessment of Syntactic Generalization in Neural Language Models"☆29Jun 18, 2021Updated 4 years ago
- Adding new tasks to T0 without catastrophic forgetting☆33Oct 20, 2022Updated 3 years ago
- ☆13Oct 5, 2025Updated 4 months ago
- Official code for the ICLR 2020 paper 'ARE PPE-TRAINED LANGUAGE MODELS AWARE OF PHRASES? SIMPLE BUT STRONG BASELINES FOR GRAMMAR INDCUTIO…☆30Jun 12, 2023Updated 2 years ago
- ☆35Apr 12, 2024Updated last year
- Transformers are Meta-Reinforcement Learners - International Conference on Machine Learning (ICML) 2022☆67May 8, 2023Updated 2 years ago
- [CVPR 2023] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference☆30Mar 14, 2024Updated last year
- [CoRL 2020] COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning☆34Oct 28, 2020Updated 5 years ago
- ☆35Jan 29, 2023Updated 3 years ago
- 🪝PISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Models☆12May 30, 2025Updated 9 months ago
- [ICLR2026] AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model☆53Oct 12, 2025Updated 4 months ago
- Jeroen Cottaar's work for the Kaggle Geophysical Waveform Inversion competition (2nd place)☆11Aug 11, 2025Updated 6 months ago
- ☆52Oct 23, 2023Updated 2 years ago
- Interpretating the latent space representations of attention head outputs for LLMs☆39Aug 13, 2024Updated last year
- [CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention☆38Mar 11, 2025Updated 11 months ago
- A Caffe/C++ implementation of Deep Deterministic Policy Gradient☆10Feb 1, 2019Updated 7 years ago
- ADAPTIVE RESONANCE THEORY. Gail A. Carpenter and Stephen Grossberg☆10Feb 10, 2015Updated 11 years ago
- ☆11Jan 15, 2021Updated 5 years ago