cliang1453 / task-aware-distillation
Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)
☆29Updated last year
Related projects ⓘ
Alternatives and complementary repositories for task-aware-distillation
- ☆31Updated last year
- ☆33Updated last year
- TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models☆59Updated 9 months ago
- ☆44Updated 10 months ago
- AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.☆53Updated 3 weeks ago
- [NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"☆58Updated 11 months ago
- Methods and evaluation for aligning language models temporally☆24Updated 8 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆29Updated 3 weeks ago
- EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning (ACL 2023)☆22Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆44Updated last year
- my commonly-used tools☆47Updated 3 months ago
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆41Updated 2 years ago
- One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning☆38Updated last year
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆69Updated 9 months ago
- [ICLR 2024] This is the repository for the paper titled "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning"☆94Updated 7 months ago
- On the Effectiveness of Parameter-Efficient Fine-Tuning☆38Updated last year
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆75Updated last month
- This is the official implementation of ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting☆14Updated 3 months ago
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆15Updated 5 months ago
- Source code for the TMLR paper "Black-Box Prompt Learning for Pre-trained Language Models"☆55Updated last year
- A Survey on the Honesty of Large Language Models☆46Updated last month
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆32Updated last week
- [ATTRIB @ NeurIPS 2024] When Attention Sink Emerges in Language Models: An Empirical View☆29Updated last month
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆65Updated 2 years ago
- Evaluating the Ripple Effects of Knowledge Editing in Language Models☆50Updated 7 months ago
- Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆10Updated 5 months ago
- [ACL 2023] Code for paper “Tailoring Instructions to Student’s Learning Levels Boosts Knowledge Distillation”(https://arxiv.org/abs/2305.…☆37Updated last year
- A curated list of awesome resources dedicated to Scaling Laws for LLMs☆63Updated last year
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆49Updated last week
- This is the oficial repository for "Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts" (EMNLP 2022)☆97Updated last year