Nicolas-BZRD / llm-recipes
☆24Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for llm-recipes
- ☆10Updated 9 months ago
- Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models".☆37Updated 2 weeks ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆44Updated last year
- DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization (ACL 2022)☆50Updated last year
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆53Updated last month
- ☆35Updated 9 months ago
- Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).☆24Updated 3 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆28Updated 7 months ago
- ☆25Updated last year
- Code for paper "Patch-Level Training for Large Language Models"☆72Updated last week
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆73Updated 8 months ago
- A Closer Look into Mixture-of-Experts in Large Language Models☆40Updated 3 months ago
- Scaling Sparse Fine-Tuning to Large Language Models☆17Updated 9 months ago
- ☆22Updated 3 weeks ago
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆139Updated 2 months ago
- ☆64Updated last month
- Are gradient information useful for pruning of LLMs?☆38Updated 7 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆38Updated 10 months ago
- ☆19Updated this week
- [EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens☆21Updated last year
- Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval☆38Updated last month
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆51Updated 3 weeks ago
- Long Context Extension and Generalization in LLMs☆39Updated 2 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆35Updated 7 months ago
- This repository contains the joint use of CPO and SimPO method for better reference-free preference learning methods.☆35Updated 3 months ago
- An Efficient LLM Fine-Tuning Factory Optimized for MoE PEFT☆46Updated this week
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆34Updated 8 months ago
- Codebase for Instruction Following without Instruction Tuning☆32Updated 2 months ago
- A collection of instruction data and scripts for machine translation.☆20Updated last year