Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference"
☆31Nov 14, 2023Updated 2 years ago
Alternatives and similar repositories for MatFormer-OLMo
Users that are interested in MatFormer-OLMo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Codebase for adaptive continual memory☆14Aug 15, 2023Updated 2 years ago
- The open-source Mixture of Depths code and the official implementation of the paper "Router-Tuning: A Simple and Effective Approach for E…☆31Mar 26, 2026Updated 2 weeks ago
- [ACL 2023 Findings] Emergent Modularity in Pre-trained Transformers☆26Jun 7, 2023Updated 2 years ago
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆29Jul 24, 2025Updated 8 months ago
- Option type implementation in C++11☆14Apr 2, 2016Updated 10 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆13May 21, 2023Updated 2 years ago
- Code repository for the paper - "Neural Priming for Sample-Efficient Adaptation"☆14Nov 13, 2023Updated 2 years ago
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…☆66Nov 18, 2025Updated 4 months ago
- Official code for the paper "Does CLIP's Generalization Performance Mainly Stem from High Train-Test Similarity?" (ICLR 2024)☆10Aug 26, 2024Updated last year
- A spoken version of the textual story cloze benchmark☆22Aug 6, 2023Updated 2 years ago
- Transformers components but in Triton☆34May 9, 2025Updated 11 months ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆23Nov 8, 2023Updated 2 years ago
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆31Apr 8, 2024Updated 2 years ago
- Pile Deduplication Code☆18May 15, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…☆14Oct 17, 2023Updated 2 years ago
- Code repository for the paper - "Matryoshka Representation Learning"☆620Feb 19, 2024Updated 2 years ago
- GoldFinch and other hybrid transformer components☆46Jul 20, 2024Updated last year
- Official repo for Learning to Reason for Long-Form Story Generation☆77Apr 19, 2025Updated 11 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs