timinar / BabyLlama
Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.
☆79Updated last year
Alternatives and similar repositories for BabyLlama:
Users that are interested in BabyLlama are comparing it to the libraries listed below
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆142Updated last month
- ☆125Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆142Updated 7 months ago
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆213Updated last month
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆81Updated 10 months ago
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆67Updated 11 months ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]☆105Updated 2 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 3 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆154Updated 10 months ago
- ☆237Updated 11 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆148Updated 2 weeks ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆160Updated 9 months ago
- ☆76Updated last week
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…☆53Updated 10 months ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆404Updated 6 months ago
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆93Updated last year
- ☆255Updated last year
- ☆77Updated 3 months ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆55Updated last year
- ☆122Updated 2 months ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆205Updated 11 months ago
- Code for paper "Patch-Level Training for Large Language Models"☆82Updated 5 months ago
- ☆219Updated 10 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆158Updated 10 months ago
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆123Updated 11 months ago
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆181Updated last year
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆87Updated 11 months ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆183Updated 2 months ago
- [ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models☆40Updated 5 months ago
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆64Updated last year