cofe-ai / MSG
Masked Structural Growth for 2x Faster Language Model Pre-training
☆22Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for MSG
- ☆64Updated last month
- ☆89Updated last month
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆73Updated 8 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆77Updated last month
- Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models".☆37Updated 2 weeks ago
- Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)☆58Updated 9 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆127Updated 2 months ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆199Updated 6 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆68Updated 5 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆71Updated 2 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆120Updated this week
- Codebase for Instruction Following without Instruction Tuning☆32Updated 2 months ago
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆17Updated 2 weeks ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆118Updated 4 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆56Updated 8 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆130Updated 2 months ago
- An Experiment on Dynamic NTK Scaling RoPE☆61Updated last year
- OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure☆18Updated 3 months ago
- [EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens☆21Updated last year
- Long Context Extension and Generalization in LLMs☆39Updated 2 months ago
- LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models☆70Updated last month
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆52Updated 2 weeks ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆51Updated 3 weeks ago
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆86Updated this week
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆146Updated 5 months ago
- Lightweight tool to identify Data Contamination in LLMs evaluation☆42Updated 8 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆28Updated 7 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆44Updated last year
- The HELMET Benchmark☆75Updated 2 weeks ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆49Updated last year