fxmeng / mixtral_spliter
Converting Mixtral-8x7B to Mixtral-[1~7]x7B
β20Updated 8 months ago
Related projects β
Alternatives and complementary repositories for mixtral_spliter
- β88Updated last month
- 𧬠RegMix: Data Mixture as Regression for Language Model Pre-trainingβ88Updated last month
- Touchstone: Evaluating Vision-Language Models by Language Modelsβ78Updated 10 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Modelsβ73Updated 8 months ago
- [ICML'24] The official implementation of βRethinking Optimization and Architecture for Tiny Language Modelsββ118Updated 4 months ago
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignmentβ66Updated 5 months ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free toβ¦β49Updated last year
- β17Updated last year
- [ICML 2024] Selecting High-Quality Data for Training Language Modelsβ146Updated 5 months ago
- β64Updated 7 months ago
- Fantastic Data Engineering for Large Language Modelsβ50Updated 3 months ago
- [NeurIPS-2024] π Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623β69Updated last month
- [ACL 2024] Long-Context Language Modeling with Parallel Encodingsβ147Updated 5 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMsβ38Updated 4 months ago
- An Experiment on Dynamic NTK Scaling RoPEβ61Updated 11 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Modelsβ126Updated 5 months ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteriaβ55Updated last month
- code for Scaling Laws of RoPE-based Extrapolationβ70Updated last year
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "β74Updated 3 weeks ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"β26Updated 4 months ago
- [ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear atβ¦β98Updated 5 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuningβ125Updated 2 months ago
- β35Updated 2 months ago
- Code for paper "Patch-Level Training for Large Language Models"β71Updated this week
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scalesβ31Updated last year
- Empirical Study Towards Building An Effective Multi-Modal Large Language Modelβ23Updated last year
- Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Processβ22Updated 3 months ago
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"β63Updated 9 months ago
- β45Updated last year
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"β43Updated 3 weeks ago