GeneZC / MiniMA
Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"
☆99Updated 7 months ago
Alternatives and similar repositories for MiniMA:
Users that are interested in MiniMA are comparing it to the libraries listed below
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆140Updated 4 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆76Updated last year
- Unofficial implementation of AlpaGasus☆90Updated last year
- ☆94Updated 4 months ago
- An Experiment on Dynamic NTK Scaling RoPE☆62Updated last year
- FuseAI Project☆83Updated 3 weeks ago
- Reformatted Alignment☆114Updated 4 months ago
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning☆89Updated last year
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆204Updated 8 months ago
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆115Updated last year
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆120Updated last month
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆76Updated 11 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆128Updated 3 months ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆125Updated 6 months ago
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆53Updated 9 months ago
- ☆120Updated 8 months ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆100Updated 7 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆145Updated 7 months ago
- ☆251Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆139Updated 5 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆53Updated this week
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆60Updated 10 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆135Updated 3 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆107Updated 9 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆98Updated 5 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆128Updated 8 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆114Updated 8 months ago
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is al…☆111Updated last year
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆153Updated 8 months ago