Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method ; GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
☆33Aug 4, 2023Updated 2 years ago
Alternatives and similar repositories for GLMKD
Users that are interested in GLMKD are comparing it to the libraries listed below
Sorting:
- ☆21Dec 5, 2022Updated 3 years ago
- ☆13Jan 22, 2025Updated last year
- Offical code repository for PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation, EMNLP 2023☆12Dec 13, 2023Updated 2 years ago
- ☆15Sep 24, 2023Updated 2 years ago
- (ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.☆21Jul 13, 2022Updated 3 years ago
- Offline Elasticsearch index generator☆26Apr 29, 2021Updated 4 years ago
- Implementation of the paper "Fine-Tuning Transformers: Vocabulary Transfer" https://arxiv.org/pdf/2112.14569.pdf☆20Dec 28, 2021Updated 4 years ago
- [ICASSP-2021] Official implementations of Multi-View Contrastive Learning for Online Knowledge Distillation (MCL-OKD)☆27Apr 7, 2021Updated 4 years ago
- 🚀 Automatically convert unstructured data into a high-quality 'textbook' format, optimized for fine-tuning Large Language Models (LLMs)☆25Oct 15, 2023Updated 2 years ago
- xKV: Cross-Layer SVD for KV-Cache Compression☆44Nov 30, 2025Updated 3 months ago
- ☆34Sep 14, 2024Updated last year
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆30Mar 28, 2024Updated last year
- Do Multilingual Language Models Think Better in English?☆42Aug 3, 2023Updated 2 years ago
- Embedding Recycling for Language models☆38Jul 11, 2023Updated 2 years ago
- Code for EMNLP 2021 main conference paper "Dynamic Knowledge Distillation for Pre-trained Language Models"☆41Aug 9, 2022Updated 3 years ago
- ☆10Oct 2, 2024Updated last year
- ☆35Mar 25, 2024Updated last year
- Source code for paper "Trajectory of Alternating Direction Method of Multipliers and Adaptive Acceleration" of NeurIPS 2019☆10Jan 25, 2024Updated 2 years ago
- [ICIP 2024]Rethinking temporal self-similarity for repetitive action counting☆10Mar 10, 2025Updated 11 months ago
- Open-source Human Feedback Library☆11Oct 25, 2023Updated 2 years ago
- Meta-Reinforcement Learning with Policy Residual Representation☆11Aug 15, 2019Updated 6 years ago
- ☆11Jul 20, 2021Updated 4 years ago
- Light Cube using PYNQ☆10Aug 4, 2018Updated 7 years ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆48Oct 20, 2025Updated 4 months ago
- Repo for "On Learning to Summarize with Large Language Models as References"☆43May 24, 2023Updated 2 years ago
- Martini middleware/handler for serving static files from binary data☆30May 17, 2014Updated 11 years ago
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Jul 11, 2023Updated 2 years ago
- An AI Fitness Coach that corrects your workout form and counts your reps.☆10Jun 22, 2022Updated 3 years ago
- PyTorch implementation of the paper "LAPRAN: A Scalable Laplacian Pyramid Reconstructive Adversarial Network for Flexible Compressive Sen…☆11May 21, 2019Updated 6 years ago
- Lift-style CSS selector transforms based on Scalate's Scuery☆10Aug 23, 2012Updated 13 years ago
- This is an official implementation of our CVPR 2020 paper "Non-Local Neural Networks With Grouped Bilinear Attentional Transforms".☆12Jan 30, 2021Updated 5 years ago
- Bachelor's grad work on code autocompletion with rnn☆10May 19, 2019Updated 6 years ago
- ☆10May 9, 2019Updated 6 years ago
- Implementation for NATv2.☆23Feb 20, 2021Updated 5 years ago
- ☆12Oct 8, 2020Updated 5 years ago
- ☆10Aug 25, 2020Updated 5 years ago
- ☆10Dec 25, 2019Updated 6 years ago
- Background materials for the article "Productivity Assessment of Neural Code Completion"☆13Jul 11, 2023Updated 2 years ago
- decontamination☆26Dec 3, 2025Updated 3 months ago