MatFormer repo
☆72Dec 9, 2024Updated last year
Alternatives and similar repositories for matformer
Users that are interested in matformer are comparing it to the libraries listed below
Sorting:
- Code for "What really matters in matrix-whitening optimizers?"☆22Oct 31, 2025Updated 4 months ago
- The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization☆18Mar 7, 2025Updated last year
- Extending the Context of Pretrained LLMs by Dropping Their Positional Embedding☆207Jan 12, 2026Updated last month
- ☆27Oct 15, 2025Updated 4 months ago
- ☆14May 13, 2024Updated last year
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- Calculate the probability of a paper being accepted by EMNLP2023 based on score distribution of ACL2023.☆14Sep 7, 2023Updated 2 years ago
- Official dataset repository for "SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation."☆19Jun 4, 2023Updated 2 years ago
- ☆39Apr 15, 2024Updated last year
- ☆37Jan 26, 2024Updated 2 years ago
- ☆17Jun 11, 2025Updated 8 months ago
- Official repo of dataset-decomposition paper [NeurIPS 2024]☆21Jan 8, 2025Updated last year
- Implementation of the dilated self attention as described in "LongNet: Scaling Transformers to 1,000,000,000 Tokens"☆13Jul 23, 2023Updated 2 years ago
- [ICML2025] Official code for "Reinforced Lifelong Editing for Language Models"☆21Feb 23, 2025Updated last year
- ☆24Dec 11, 2024Updated last year
- ☆129Jun 6, 2025Updated 9 months ago
- ☆49Sep 26, 2025Updated 5 months ago
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆16Sep 15, 2023Updated 2 years ago
- [ICLR'25] Code for KaSA, an official implementation of "KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models"☆20Jan 16, 2025Updated last year
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- Scalable and Stable Parallelization of Nonlinear RNNS☆29Oct 21, 2025Updated 4 months ago
- Adaptation of titans-pytorch to llama models on HF☆25Mar 6, 2025Updated last year
- ☆91Aug 18, 2024Updated last year
- Python library to use Pleias-RAG models☆68May 1, 2025Updated 10 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆56Aug 20, 2024Updated last year
- [ICLR 2026] The official repository for the paper "AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning".☆72Feb 27, 2026Updated last week
- RADLADS training code☆37May 7, 2025Updated 10 months ago
- Benchmarking Benchmark Leakage in Large Language Models☆60May 20, 2024Updated last year
- ☆25Jan 30, 2025Updated last year
- Tooling for exact and MinHash deduplication of large-scale text datasets☆72Feb 19, 2026Updated 2 weeks ago
- ☆35Jul 10, 2025Updated 7 months ago
- Experiments on the impact of depth in transformers and SSMs.☆41Oct 23, 2025Updated 4 months ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆31Nov 14, 2023Updated 2 years ago
- AI Energy Score: Initiative to establish comparable energy efficiency ratings for AI models.☆38Dec 2, 2025Updated 3 months ago
- toy reproduction of Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts☆31Sep 1, 2024Updated last year
- ☆109Jul 15, 2025Updated 7 months ago
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆29Jul 30, 2020Updated 5 years ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32May 25, 2024Updated last year
- ☆35Apr 12, 2024Updated last year