mingkai-zheng / GENIUS
Can GPT-4 Perform Neural Architecture Search?
☆82Updated last year
Related projects ⓘ
Alternatives and complementary repositories for GENIUS
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆15Updated 5 months ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆43Updated last year
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆43Updated last year
- ☆26Updated last year
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆79Updated last year
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆49Updated 3 weeks ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆68Updated 5 months ago
- [NeurIPS‘2021] "MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge", Geng Yuan, Xiaolong Ma, Yanzhi Wang et al…☆18Updated 2 years ago
- [ICLR 2021] "Long Live the Lottery: The Existence of Winning Tickets in Lifelong Learning" by Tianlong Chen*, Zhenyu Zhang*, Sijia Liu, S…☆23Updated 2 years ago
- ☆47Updated last year
- Code accompanying the paper "Massive Activations in Large Language Models"☆123Updated 8 months ago
- Recycling diverse models☆44Updated last year
- NAS Benchmark in "Prioritized Architecture Sampling with Monto-Carlo Tree Search", CVPR2021☆37Updated 3 years ago
- Code and data for the benchmark "Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Lan…☆34Updated 4 months ago
- AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning (Zhou et al.; TACL)☆44Updated 8 months ago
- BESA is a differentiable weight pruning technique for large language models.☆14Updated 8 months ago
- Official PyTorch implementation of "Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets" (ICLR 2021)☆63Updated 3 months ago
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆47Updated last year
- [ICML 2021] "Efficient Lottery Ticket Finding: Less Data is More" by Zhenyu Zhang*, Xuxi Chen*, Tianlong Chen*, Zhangyang Wang☆25Updated 2 years ago
- [ICML 2024] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆73Updated 4 months ago
- ☆18Updated 3 months ago
- ☆12Updated 3 weeks ago
- The Efficiency Spectrum of LLM☆52Updated 11 months ago
- [IJCAI 2023] Black-box Prompt Tuning for Vision-Language Model as a Service☆15Updated last year
- Code for T-MARS data filtering☆35Updated last year
- Prospect Pruning: Finding Trainable Weights at Initialization Using Meta-Gradients☆29Updated 2 years ago
- ☆35Updated 3 years ago
- AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.☆53Updated 3 weeks ago
- [ICLR'24] "DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training" by Aochuan Chen*, Yimeng Zhang*, Jinghan Jia, James Di…☆43Updated last month
- AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers☆42Updated 2 years ago