Nicolas-BZRD / llm-recipes
☆28Updated last year
Alternatives and similar repositories for llm-recipes:
Users that are interested in llm-recipes are comparing it to the libraries listed below
- ☆10Updated 2 months ago
- Code for paper "Patch-Level Training for Large Language Models"☆82Updated 5 months ago
- [ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models☆40Updated 5 months ago
- DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization (ACL 2022)☆50Updated last year
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆76Updated last year
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆51Updated 2 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆51Updated 2 years ago
- This repository contains the joint use of CPO and SimPO method for better reference-free preference learning methods.☆53Updated 8 months ago
- Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).☆25Updated 8 months ago
- An Experiment on Dynamic NTK Scaling RoPE☆63Updated last year
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆207Updated last month
- Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models". A general white-box KD framework for both same…☆47Updated 5 months ago
- About Code for the paper "NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models" (EMNLP…☆15Updated last year
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆116Updated last year
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning☆90Updated last year
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆168Updated 10 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 3 months ago
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)☆40Updated last year
- Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.☆79Updated last year
- Are gradient information useful for pruning of LLMs?☆43Updated last year
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆83Updated 7 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆27Updated last year
- Unofficial implementation of AlpaGasus☆90Updated last year
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆55Updated this week
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆57Updated 6 months ago
- ☆14Updated 5 months ago
- ☆49Updated last year
- Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models☆140Updated 2 years ago
- Code for "SemDeDup", a simple method for identifying and removing semantic duplicates from a dataset (data pairs which are semantically s…☆134Updated last year
- The code of paper "Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation" published at NeurIPS 202…☆46Updated 2 years ago