BorealisAI / flora-optLinks
This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.
β104Updated 11 months ago
Alternatives and similar repositories for flora-opt
Users that are interested in flora-opt are comparing it to the libraries listed below
Sorting:
- β126Updated last year
- Implementation of π₯₯ Coconut, Chain of Continuous Thought, in Pytorchβ175Updated this week
- β198Updated 6 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ127Updated 10 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Modelsβ221Updated last month
- This repo is based on https://github.com/jiaweizzhao/GaLoreβ28Updated 9 months ago
- Token Omission Via Attentionβ128Updated 8 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"β98Updated 8 months ago
- β68Updated 11 months ago
- β183Updated last year
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β239Updated 4 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"β163Updated last year
- β79Updated 10 months ago
- EvaByte: Efficient Byte-level Language Models at Scaleβ102Updated 2 months ago
- β79Updated 7 months ago
- Mixture of A Million Expertsβ46Updated 10 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasksβ144Updated 9 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β134Updated this week
- β96Updated 9 months ago
- Code accompanying the paper "Massive Activations in Large Language Models"β163Updated last year
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"β85Updated this week
- β218Updated last year
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performanceβ¦β149Updated 2 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β235Updated 2 weeks ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ126Updated 6 months ago
- Some preliminary explorations of Mamba's context scaling.β214Updated last year
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Modelsβ213Updated 3 weeks ago
- PB-LLM: Partially Binarized Large Language Modelsβ152Updated last year
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".β173Updated 2 months ago
- Model Stock: All we need is just a few fine-tuned modelsβ117Updated 9 months ago