danbider / lora-tradeoffsLinks
Information and artifacts for "LoRA Learns Less and Forgets Less" (TMLR, 2024)
☆16Updated last year
Alternatives and similar repositories for lora-tradeoffs
Users that are interested in lora-tradeoffs are comparing it to the libraries listed below
Sorting:
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆84Updated 11 months ago
- ☆33Updated 9 months ago
- ☆19Updated 6 months ago
- ☆86Updated last year
- ☆51Updated 7 months ago
- This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.☆103Updated last year
- nanoGPT-like codebase for LLM training☆107Updated 5 months ago
- ☆72Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆166Updated 3 months ago
- The evaluation framework for training-free sparse attention in LLMs☆101Updated last week
- ☆69Updated last year
- Unofficial Implementation of Selective Attention Transformer☆17Updated 11 months ago
- ☆32Updated last year
- ☆107Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated 10 months ago
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆101Updated last month
- 📄Small Batch Size Training for Language Models☆63Updated 2 weeks ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆44Updated last year
- PyTorch library for Active Fine-Tuning☆93Updated 3 weeks ago
- ☆83Updated last year
- The official github repo for "Diffusion Language Models are Super Data Learners".☆134Updated 2 weeks ago
- Mamba support for transformer lens☆18Updated last year
- Code for studying the super weight in LLM☆120Updated 10 months ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆31Updated 2 years ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆177Updated 6 months ago
- Fluid Language Model Benchmarking☆19Updated last month
- ☆50Updated last year
- ☆12Updated 11 months ago
- Code accompanying the paper "Massive Activations in Large Language Models"☆184Updated last year
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆116Updated last year