[ACL 2023 Findings] Emergent Modularity in Pre-trained Transformers
☆26Jun 7, 2023Updated 2 years ago
Alternatives and similar repositories for Modularity-Analysis
Users that are interested in Modularity-Analysis are comparing it to the libraries listed below
Sorting:
- Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]☆39May 28, 2024Updated last year
- Source code for a LoRA-based continual relation extraction method.☆14Sep 25, 2023Updated 2 years ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- Repository for Sparse Universal Transformers☆20Oct 23, 2023Updated 2 years ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- Mixture of Attention Heads☆51Oct 10, 2022Updated 3 years ago
- ☆19Oct 31, 2022Updated 3 years ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- [ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.☆27Apr 21, 2025Updated 10 months ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆31Nov 14, 2023Updated 2 years ago
- [ACL 2023] Plug-and-Play Knowledge Injection for Pre-trained Language Models☆61Apr 1, 2024Updated last year
- Muon fsdp 2☆53Aug 8, 2025Updated 6 months ago
- Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''☆31Oct 24, 2024Updated last year
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆114May 2, 2022Updated 3 years ago
- Sparse Backpropagation for Mixture-of-Expert Training☆29Jul 2, 2024Updated last year
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆226Sep 18, 2025Updated 5 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆38Jun 11, 2025Updated 8 months ago
- 📄 Evidence Retrieval and Claim Verification for the FEVER shared task using Transformer Networks☆12Feb 21, 2020Updated 6 years ago
- Triton-based implementation of Sparse Mixture of Experts.☆266Oct 3, 2025Updated 4 months ago
- LMTuner: Make the LLM Better for Everyone☆38Sep 21, 2023Updated 2 years ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆106Dec 20, 2024Updated last year
- ☆92Dec 23, 2024Updated last year
- Empowering Scientific Research with AI Assistance! Open Source Code for Data-Driven Dimensional Analysis.☆14Aug 7, 2025Updated 6 months ago
- Implementation of data dimensionality reduction algorithms SVD and CUR without using library functions.☆10Jul 24, 2017Updated 8 years ago
- Format conversion and graphical representation of [Universal Dependencies](http://universaldependencies.org) trees.☆12Sep 3, 2024Updated last year
- Code for COLING 2022 accepted paper titled "MuCDN: Mutual Conversational Detachment Network for Emotion Recognition in Multi-Party Conver…☆10Jul 21, 2023Updated 2 years ago
- 在监控画质下实现对校园自行车的重识别,包含REID模型识别,向量数据库检索,UI展示☆10Feb 13, 2024Updated 2 years ago
- AI_Tools_and_Papers_Providers_Frameworks is a curated collection of AI resources, including tools, papers, frameworks, and providers. It …☆45Nov 4, 2025Updated 3 months ago
- LibMoE: A LIBRARY FOR COMPREHENSIVE BENCHMARKING MIXTURE OF EXPERTS IN LARGE LANGUAGE MODELS