sebulo / LoQT
☆27Updated this week
Related projects ⓘ
Alternatives and complementary repositories for LoQT
- Here we will test various linear attention designs.☆56Updated 6 months ago
- Official Implementation Of The Paper: `DeciMamba: Exploring the Length Extrapolation Potential of Mamba'☆20Updated 3 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆18Updated last month
- This repository contains code for the MicroAdam paper.☆12Updated 4 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆43Updated 3 months ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆25Updated 4 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆91Updated last month
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆48Updated 2 months ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆18Updated 11 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆56Updated 3 weeks ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆49Updated 2 weeks ago
- ☆24Updated 8 months ago
- Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆71Updated 4 months ago
- DPO, but faster 🚀☆20Updated last week
- ☆34Updated 8 months ago
- Utilities for Training Very Large Models☆56Updated last month
- Using FlexAttention to compute attention with different masking patterns☆40Updated last month
- ☆62Updated 4 months ago
- Stick-breaking attention☆32Updated last week
- ☆50Updated last week
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆49Updated last month
- Official code for the paper "Attention as a Hypernetwork"☆23Updated 4 months ago
- A repository for research on medium sized language models.☆74Updated 5 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 3 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆51Updated this week
- HGRN2: Gated Linear RNNs with State Expansion☆48Updated 2 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆28Updated 7 months ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- A Closer Look into Mixture-of-Experts in Large Language Models☆38Updated 3 months ago
- ☆76Updated 5 months ago