Nicolas-BZRD / llm-distillation
☆10Updated 2 months ago
Alternatives and similar repositories for llm-distillation:
Users that are interested in llm-distillation are comparing it to the libraries listed below
- ☆28Updated last year
- Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).☆25Updated 8 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆51Updated 2 years ago
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆20Updated 8 months ago
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆16Updated this week
- ☆25Updated last year
- Exploration of automated dataset selection approaches at large scales.☆39Updated last month
- ☆16Updated 6 months ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆57Updated 6 months ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆51Updated 2 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆53Updated last month
- ☆43Updated 8 months ago
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆42Updated 2 months ago
- Long Context Extension and Generalization in LLMs☆53Updated 7 months ago
- Benchmarking Benchmark Leakage in Large Language Models☆51Updated 11 months ago
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"☆15Updated 9 months ago
- ☆38Updated last year
- ☆28Updated last year
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"☆23Updated 7 months ago
- About Code for the paper "NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models" (EMNLP…☆16Updated last year
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)☆40Updated last year
- Code for "Tracing Knowledge in Language Models Back to the Training Data"☆37Updated 2 years ago
- Learning adapter weights from task descriptions☆17Updated last year
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆25Updated last year
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆56Updated 2 years ago
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆36Updated 11 months ago
- Retrieval as Attention☆83Updated 2 years ago
- Adding new tasks to T0 without catastrophic forgetting☆33Updated 2 years ago
- Lightweight tool to identify Data Contamination in LLMs evaluation☆50Updated last year
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆29Updated 3 months ago