KaihuaTang / Qwen-Tokenizer-Pruner
Due to the huge vocaburary size (151,936) of Qwen models, the Embedding and LM Head weights are excessively heavy. Therefore, this project provides a Tokenizer vocabulary shearing solution for Qwen and Qwen-VL.
☆12Updated 6 months ago
Alternatives and similar repositories for Qwen-Tokenizer-Pruner:
Users that are interested in Qwen-Tokenizer-Pruner are comparing it to the libraries listed below
- The code and data for the paper JiuZhang3.0☆40Updated 8 months ago
- Code for paper "UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning", ACL 2022☆59Updated 2 years ago
- ☆61Updated 8 months ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆63Updated 4 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆35Updated 10 months ago
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆35Updated 6 months ago
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆39Updated 3 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆29Updated 7 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆45Updated 3 months ago
- ☆95Updated 4 months ago
- Code for paper "Patch-Level Training for Large Language Models"☆80Updated 3 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆76Updated 11 months ago
- Feeling confused about super alignment? Here is a reading list☆42Updated last year
- ☆57Updated 2 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆63Updated 3 months ago
- my commonly-used tools☆50Updated last month
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆31Updated last year
- Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"☆45Updated 2 years ago
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆77Updated last year
- [ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models☆34Updated 2 months ago
- 🚀LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆69Updated 2 months ago
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆110Updated 3 months ago
- [ACL 2023] Code for paper “Tailoring Instructions to Student’s Learning Levels Boosts Knowledge Distillation”(https://arxiv.org/abs/2305.…☆38Updated last year
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment☆73Updated 8 months ago
- ☆17Updated 2 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆17Updated last month
- ☆45Updated 8 months ago
- M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆55Updated last month