xichen-fy / Fira
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
☆89Updated 2 months ago
Alternatives and similar repositories for Fira:
Users that are interested in Fira are comparing it to the libraries listed below
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆70Updated 7 months ago
- ☆50Updated last month
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆42Updated 3 months ago
- An Efficient LLM Fine-Tuning Factory Optimized for MoE PEFT☆61Updated 2 weeks ago
- The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆110Updated last month
- "Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiw…☆25Updated 8 months ago
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "☆95Updated 2 months ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆55Updated 2 months ago
- ☆169Updated 2 months ago
- Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆76Updated last month
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆60Updated 9 months ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆44Updated 6 months ago
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆51Updated last month
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆145Updated 7 months ago
- [ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely☆20Updated 6 months ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆75Updated 7 months ago
- ☆91Updated 6 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆85Updated 3 months ago
- ☆35Updated last month
- Source code for the paper "LongGenBench: Long-context Generation Benchmark"☆12Updated 3 months ago
- 16-fold memory access reduction with nearly no loss☆63Updated 2 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆130Updated 4 months ago
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆135Updated 3 months ago
- ☆121Updated 5 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆39Updated 2 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆91Updated 3 weeks ago
- State-of-the-art Parameter-Efficient MoE Fine-tuning Method☆120Updated 4 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆147Updated last month
- [EMNLP 2023 Main] Sparse Low-rank Adaptation of Pre-trained Language Models☆70Updated 10 months ago
- ☆119Updated 5 months ago