☆35Apr 8, 2025Updated 10 months ago
Alternatives and similar repositories for matrix
Users that are interested in matrix are comparing it to the libraries listed below
Sorting:
- Combining SOAP and MUON☆19Feb 11, 2025Updated last year
- ☆20May 30, 2024Updated last year
- Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention☆12May 24, 2023Updated 2 years ago
- ☆12Jan 29, 2021Updated 5 years ago
- ☆50Jun 16, 2025Updated 8 months ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year
- Code implementation for: From Virtual Games to Real-World Play☆46Jun 23, 2025Updated 8 months ago
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…☆21Mar 15, 2025Updated 11 months ago
- Mamba support for transformer lens☆19Sep 17, 2024Updated last year
- [ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang☆14Jan 4, 2024Updated 2 years ago
- BigKnow2022: Bringing Language Models Up to Speed☆16Mar 27, 2023Updated 2 years ago
- [ICML 2024] Serving LLMs on heterogeneous decentralized clusters.☆34May 6, 2024Updated last year
- ☆66Jul 8, 2025Updated 7 months ago
- PyCUDA based PyTorch Extension Made Easy☆27Mar 22, 2024Updated last year
- The official repo for "OpenMoE 2: Sparse Diffusion Language Models".☆52Dec 28, 2025Updated 2 months ago
- Tiny-FSDP, a minimalistic re-implementation of the PyTorch FSDP☆99Aug 20, 2025Updated 6 months ago
- ☆24Sep 25, 2024Updated last year
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Jun 6, 2024Updated last year
- ☆29Nov 30, 2021Updated 4 years ago
- Checkpointable dataset utilities for foundation model training☆32Jan 29, 2024Updated 2 years ago
- "Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiw…☆31May 7, 2024Updated last year
- train with kittens!☆63Oct 25, 2024Updated last year
- Code of the paper "FreePCA:Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Princi…☆28Aug 26, 2025Updated 6 months ago
- ☆33Aug 9, 2024Updated last year
- Adaptive Local Implicit Image Function for Arbitrary-scale Super-resolution, accepted by the International Conference on Image Processing…☆21Nov 2, 2022Updated 3 years ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics☆71Jan 13, 2026Updated last month
- ☆33Oct 4, 2024Updated last year
- Code release for paper "Test-Time Training Done Right"☆379Jan 5, 2026Updated last month
- Longitudinal Evaluation of LLMs via Data Compression☆33May 29, 2024Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training (ICLR 2023)☆32Apr 8, 2023Updated 2 years ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆91Jul 17, 2025Updated 7 months ago
- ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer☆41Jan 29, 2026Updated last month
- 🤗 Unofficial huggingface/diffusers-based implementation of the paper "Training-Free Layout Control with Cross-Attention Guidance".☆42May 24, 2023Updated 2 years ago
- Official repository for CVPR'23 paper: Detecting Backdoors in Pre-trained Encoders☆36Sep 25, 2023Updated 2 years ago
- ☆163Jan 6, 2025Updated last year
- [CVPR'25] MergeVQ: A Unified Framework for Visual Generation and Representation with Token Merging and Quantization☆47Jul 22, 2025Updated 7 months ago
- Official code for the paper "Attention as a Hypernetwork"☆51Updated this week