Outsider565 / LoRA-GA
☆125Updated this week
Related projects: ⓘ
- ☆139Updated 2 months ago
- LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment☆190Updated 4 months ago
- AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning (ICLR 2023).☆253Updated last year
- PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models☆244Updated last month
- ☆109Updated last month
- [SIGIR'24] The official implementation code of MOELoRA.☆113Updated last month
- A generalized framework for subspace tuning methods in parameter efficient fine-tuning.☆59Updated this week
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆224Updated 2 months ago
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "☆49Updated last month
- Rectified Rotary Position Embeddings☆329Updated 3 months ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆45Updated last year
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆133Updated 3 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆114Updated 2 months ago
- ☆119Updated last week
- Low-bit optimizers for PyTorch☆109Updated 11 months ago
- ☆54Updated 2 months ago
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆81Updated 3 weeks ago
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆120Updated 4 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆71Updated 6 months ago
- LongAlign: A Recipe for Long Context Alignment Encompassing Data, Training, and Evaluation☆194Updated 4 months ago
- Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"☆59Updated 2 months ago
- ☆169Updated 9 months ago
- Official implementation of TransNormerLLM: A Faster and Better LLM☆223Updated 7 months ago
- [ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning☆337Updated 2 months ago
- [ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)☆111Updated 6 months ago
- ☆71Updated 8 months ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆30Updated last year
- ☆87Updated 4 months ago
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆182Updated 4 months ago
- The source code of the EMNLP 2023 main conference paper: Sparse Low-rank Adaptation of Pre-trained Language Models.☆62Updated 6 months ago