sail-sg / scaling-with-vocabLinks

[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623

☆89

Alternatives and similar repositories for scaling-with-vocab

Users that are interested in scaling-with-vocab are comparing it to the libraries listed below

Sorting:

shaochenze / PatchTrain
Code for paper "Patch-Level Training for Large Language Models"
☆95Updated 3 weeks ago
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆105Updated last month
Leooyii / LCEG
Long Context Extension and Generalization in LLMs
☆62Updated last year
DAMO-NLP-SG / CLEX
[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models
☆78Updated last year
SihengLi99 / SEALONG
Large Language Models Can Self-Improve in Long-context Reasoning
☆73Updated last year
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆50Updated 9 months ago
Infini-AI-Lab / Multiverse
☆104Updated 2 months ago
DAMO-NLP-SG / LongPO
[ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
☆43Updated 9 months ago
imagination-research / lbt
[NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study
☆56Updated last year
ryoungj / BoLT
Code for "Reasoning to Learn from Latent Thoughts"
☆122Updated 8 months ago
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆55Updated 9 months ago
sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆40Updated last month
hkust-nlp / PreSelect
[ICML 2025] Predictive Data Selection: The Data That Predicts Is the Data That Teaches
☆57Updated 9 months ago
inclusionAI / PromptCoT
A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…
☆129Updated last month
GuanghaoYe / Emergence-of-Thinking
☆53Updated 9 months ago
GAIR-NLP / OctoThinker
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆180Updated 4 months ago
yyDing1 / ScaleQuest
[ACL 2025] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLM…
☆68Updated last year
zjunlp / LightThinker
[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
☆123Updated 7 months ago
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆118Updated 7 months ago
hkust-nlp / llm-compression-intelligence
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆143Updated last year
TIGER-AI-Lab / AceCoder
The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]
☆94Updated 7 months ago
Shwai-He / MEO
The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":
☆40Updated last year
locuslab / scaling_laws_data_filtering
☆65Updated last year
leezythu / FocusLLM
FocusLLM: Scaling LLM’s Context by Parallel Decoding
☆43Updated 11 months ago
yegcjs / DiffusionLLM
Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"
☆83Updated last year
TIGER-AI-Lab / General-Reasoner
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
☆204Updated last week
haonan3 / AnchorContext
AnchorAttention: Improved attention for LLMs long-context training
☆213Updated 10 months ago
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆76Updated 6 months ago
allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Updated last year
efficientscaling / Z1
[EMNLP'25 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"
☆67Updated 7 months ago