timinar / BabyLlamaView external linksLinks
Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.
☆85Oct 18, 2023Updated 2 years ago
Alternatives and similar repositories for BabyLlama
Users that are interested in BabyLlama are comparing it to the libraries listed below
Sorting:
- Code for pre-training BabyLM baseline models.☆16Jun 19, 2023Updated 2 years ago
- [ICCAD 2025] Squant☆15Jul 3, 2025Updated 7 months ago
- KDSS is the framework for knowledge distillation from LLMs☆12Nov 5, 2025Updated 3 months ago
- Cascade Speculative Drafting☆32Apr 2, 2024Updated last year
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆39Jan 12, 2024Updated 2 years ago
- [NeurIPS '25] Multi-Token Prediction Needs Registers☆26Dec 14, 2025Updated 2 months ago
- ☆16Oct 16, 2024Updated last year
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)☆40Aug 28, 2023Updated 2 years ago
- Implementation of "Decoding-time Realignment of Language Models", ICML 2024.☆21Jun 17, 2024Updated last year
- Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024☆22Jun 26, 2024Updated last year
- OpenBA-V2: 3B LLM (Large Language Model) with T5 architecture, utilizing model pruning technique and continuing pretraining from OpenBA-1…☆25May 10, 2024Updated last year
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆327Nov 26, 2025Updated 2 months ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆31Nov 14, 2023Updated 2 years ago
- a simplified version of Google's Gemma model to be used for learning☆26Mar 2, 2024Updated last year
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆77Apr 29, 2024Updated last year
- ☆15Sep 3, 2025Updated 5 months ago
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆640Mar 4, 2024Updated last year
- [Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Prunin…☆41Sep 9, 2025Updated 5 months ago
- This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicit…☆1,252Mar 9, 2025Updated 11 months ago
- (ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation☆34May 28, 2025Updated 8 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆32May 29, 2024Updated last year
- ☆580Sep 7, 2023Updated 2 years ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆30Mar 28, 2024Updated last year
- A Framework for Evaluating AI Agent Safety in Realistic Environments