Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"
☆33May 9, 2024Updated last year
Alternatives and similar repositories for ParaKnowTransfer
Users that are interested in ParaKnowTransfer are comparing it to the libraries listed below
Sorting:
- The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions (EMNLP 2023))☆13Dec 21, 2023Updated 2 years ago
- Radiantloom Email Assist 7B is an email-assistant large language model fine-tuned from Zephyr-7B-Beta, over a custom-curated dataset of 1…☆14Jan 19, 2024Updated 2 years ago
- Official PyTorch implementation of CD-MOE☆12Mar 29, 2025Updated 11 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆45Oct 1, 2025Updated 5 months ago
- [ACL'22] Training-free Neural Architecture Search for RNNs and Transformers☆14May 26, 2024Updated last year
- Codes for DATA: Differentiable ArchiTecture Approximation.☆11Jul 22, 2021Updated 4 years ago
- ☆10Jul 25, 2024Updated last year
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Jun 10, 2024Updated last year
- Auto-Prox-AAAI24☆14Apr 30, 2024Updated last year
- Data-free knowledge distillation using Gaussian noise (NeurIPS paper)☆15Mar 24, 2023Updated 2 years ago
- ☆20Aug 16, 2021Updated 4 years ago
- PyTorch implementation for OD-cheap-convolution.☆20Sep 29, 2019Updated 6 years ago
- ☆19May 11, 2021Updated 4 years ago
- GIFT (ACL 2023) & MPC-BERT (ACL 2021) for Multi-Party Conversation Understanding☆41Jul 12, 2023Updated 2 years ago
- Repository for the EMNLP 2023 Demo Paper "Reaction Miner: An Integrated System for Chemical Reaction Extraction from Textual Data"☆19Jan 27, 2025Updated last year
- Learning adapter weights from task descriptions☆19Nov 12, 2023Updated 2 years ago
- [NeurIPS 2019] E2-Train: Training State-of-the-art CNNs with Over 80% Less Energy☆21Nov 18, 2019Updated 6 years ago
- ☆18Nov 6, 2019Updated 6 years ago
- ☆19Mar 5, 2019Updated 7 years ago
- Code repository for the paper - "Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass"☆21Aug 22, 2024Updated last year
- Code and dataset for the emnlp paper titled Instruct and Extract: Instruction Tuning for On-Demand Information Extraction☆54Jan 2, 2024Updated 2 years ago
- ☆26Apr 12, 2022Updated 3 years ago
- A simple pytorch implementation of Differentiable Architecture Search (DARTS)☆22Aug 27, 2019Updated 6 years ago
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆51Aug 24, 2025Updated 6 months ago
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆20Feb 16, 2024Updated 2 years ago
- Momentum Decoding: Open-ended Text Generation as Graph Exploration☆19Jan 27, 2023Updated 3 years ago
- Revisiting Parameter Sharing for Automatic Neural Channel Number Search, NeurIPS 2020☆22Nov 15, 2020Updated 5 years ago
- [ICML 2021 Oral] "CATE: Computation-aware Neural Architecture Encoding with Transformers" by Shen Yan, Kaiqiang Song, Fei Liu, Mi Zhang☆19Jun 23, 2021Updated 4 years ago
- ☆18Oct 16, 2019Updated 6 years ago
- Implementation of PGONAS for CVPR22W and RD-NAS for ICASSP23☆23Apr 25, 2023Updated 2 years ago
- The official project website of "NORM: Knowledge Distillation via N-to-One Representation Matching" (The paper of NORM is published in IC…☆20Sep 18, 2023Updated 2 years ago
- [COLING'22] Code for our paper: "COLO: A Contrastive Learning based Re-ranking Framework for One-Stage Summarization"☆22Oct 21, 2022Updated 3 years ago
- D^2-MoE: Delta Decompression for MoE-based LLMs Compression☆72Mar 25, 2025Updated 11 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal …☆56Feb 28, 2023Updated 3 years ago
- [NeurIPS 2020] "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?" by Shen Yan, Yu Zheng, Wei Ao, X…☆50Jan 19, 2021Updated 5 years ago
- TF-FD☆20Nov 19, 2022Updated 3 years ago
- ☆26Dec 10, 2020Updated 5 years ago
- Bilinear CNNs in PyTorch☆21Oct 12, 2019Updated 6 years ago
- [NeurIPS 2021] “Stronger NAS with Weaker Predictors“, Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang W…☆27Sep 23, 2022Updated 3 years ago