☆36Dec 12, 2023Updated 2 years ago
Alternatives and similar repositories for looped_transformer
Users that are interested in looped_transformer are comparing it to the libraries listed below
Sorting:
- Official implementation of the transformer (TF) architecture suggested in a paper entitled "Looped Transformers as Programmable Computers…☆35Apr 8, 2023Updated 2 years ago
- Generative Equilibrium Transformer☆27Nov 11, 2023Updated 2 years ago
- Residual vector quantization for KV cache compression in large language model☆11Oct 22, 2024Updated last year
- Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure (NeurIPS 2024) + Arithmetic Transfor…☆14Oct 26, 2025Updated 4 months ago
- ☆24Mar 2, 2026Updated last week
- Combining SOAP and MUON☆19Feb 11, 2025Updated last year
- ☆11Jun 29, 2021Updated 4 years ago
- Code accompanying the paper "A contrastive rule for meta-learning"☆13Oct 31, 2024Updated last year
- 💻 Terminal-Agent with Human-in-the-Loop Learning☆35Jan 16, 2026Updated last month
- ☆12Sep 18, 2024Updated last year
- ☆36Feb 12, 2025Updated last year
- Learning Accurate Decision Trees with Bandit Feedback via Quantized Gradient Descent☆17Sep 8, 2022Updated 3 years ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Jun 21, 2019Updated 6 years ago
- Mamba support for transformer lens☆19Sep 17, 2024Updated last year
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- Official repo of paper LM2☆47Feb 13, 2025Updated last year
- ☆20Mar 1, 2023Updated 3 years ago
- ☆45Apr 30, 2018Updated 7 years ago
- ☆20Oct 25, 2022Updated 3 years ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆57Mar 10, 2025Updated 11 months ago
- Manually implemented quantization-aware training☆23Oct 12, 2022Updated 3 years ago
- ☆27Feb 1, 2023Updated 3 years ago
- benchmarking some transformer deployments☆26Dec 15, 2025Updated 2 months ago
- Omnigrok: Grokking Beyond Algorithmic Data☆63Feb 24, 2023Updated 3 years ago
- Fast matrix multiplication for few-bit integer matrices on CPUs.☆28Mar 19, 2019Updated 6 years ago
- Educational verilog library that supports IEEE754 floating point arithmetic with a parametrizable mantissa and exponent☆32Mar 13, 2025Updated 11 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆199May 28, 2024Updated last year
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- Official Repositiory for Spherical Voronoi: Directional Appearance as a Differentiable Partition of the Sphere☆72Jan 29, 2026Updated last month
- Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022☆31May 29, 2023Updated 2 years ago
- PyTorch implementation for our ICLR 2024 paper "Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory…☆26Dec 21, 2023Updated 2 years ago
- ☆35Apr 12, 2024Updated last year
- A Learnable LSH Framework for Efficient NN Training☆34Jul 22, 2021Updated 4 years ago
- Wrappers for open source FPU hardware implementations.☆37Nov 27, 2025Updated 3 months ago
- QJL: 1-Bit Quantized JL transform for KV Cache Quantization with Zero Overhead☆32Jan 27, 2025Updated last year
- BitLinear implementation☆35Jan 1, 2026Updated 2 months ago
- ☆33Oct 31, 2024Updated last year
- [CVPR 2023] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference☆30Mar 14, 2024Updated last year
- Learning Universal Predictors☆81Aug 1, 2024Updated last year