timinar / BabyLlamaLinks

Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.

☆80

Alternatives and similar repositories for BabyLlama

Users that are interested in BabyLlama are comparing it to the libraries listed below

Sorting:

wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
☆144Updated 9 months ago
shaochenze / PatchTrain
Code for paper "Patch-Level Training for Large Language Models"
☆85Updated 7 months ago
princeton-nlp / CEPE
[ACL 2024] Long-Context Language Modeling with Parallel Encodings
☆154Updated last year
jongwooko / distillm
Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)
☆222Updated 3 months ago
DAMO-NLP-SG / CLEX
[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models
☆78Updated last year
HKUNLP / ChunkLlama
[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
☆411Updated 8 months ago
YuchuanTian / RethinkTinyLM
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
☆121Updated 5 months ago
whyNLP / LCKV
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆149Updated 2 months ago
HanGuo97 / lq-lora
☆126Updated last year
astramind-ai / Mixture-of-depths
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆163Updated last year
princeton-nlp / ProLong
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
☆202Updated 3 months ago
sail-sg / regmix
[ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)
☆149Updated 4 months ago
sramshetty / ShortGPT
Unofficial implementations of block/layer-wise pruning methods for LLMs.
☆70Updated last year
UNITES-Lab / MC-SMoE
[ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆85Updated last week
Glaciohound / LM-Infinite
Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆146Updated 3 months ago
jshuadvd / LongRoPE
Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper
☆137Updated 11 months ago
princeton-nlp / QuRating
[ICML 2024] Selecting High-Quality Data for Training Language Models
☆176Updated last year
SparkJiao / llama-pipeline-parallel
A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…
☆55Updated last year
FasterDecoding / SnapKV
☆256Updated last year
FasterDecoding / TEAL
☆130Updated 4 months ago
18907305772 / FuseAI
FuseAI Project
☆87Updated 5 months ago
TIGER-AI-Lab / LongICLBench
Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]
☆105Updated 4 months ago
shreyansh26 / Speculative-Sampling
Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind
☆98Updated last year
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆203Updated last year
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆111Updated 4 months ago
SkyworkAI / Skywork-MoE
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
☆133Updated last year
SalesforceAIResearch / GemFilter
☆80Updated 5 months ago
gpt4life / alpagasus
Unofficial implementation of AlpaGasus
☆91Updated last year
yxli2123 / LoftQ
☆223Updated last year
tianyi-lab / Superfiltering
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
☆160Updated this week