sail-sg/SkyLadder

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sail-sg/SkyLadder)

sail-sg / SkyLadder

The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling

☆43

Alternatives and similar repositories for SkyLadder

Users that are interested in SkyLadder are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

real-absolute-AI / Unnatural_Language
View on GitHub
The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'
☆24May 20, 2025Updated last year
JinjieNi / Quokka
View on GitHub
The official github repo for "Training Optimal Large Diffusion Language Models", the first-ever large-scale diffusion language models sca…
☆46Nov 6, 2025Updated 8 months ago
sail-sg / P-DoS
View on GitHub
[ArXiv 2025] Denial-of-Service Poisoning Attacks on Large Language Models
☆23Oct 22, 2024Updated last year
sail-sg / VeriFree
View on GitHub
Reinforcing General Reasoning without Verifiers
☆102Jun 24, 2025Updated last year
assafbk / OPRM
View on GitHub
Overflow Prevention Enhances Long-Context Recurrent LLMs (COLM 2025)
☆18Jul 8, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
sail-sg / scaling-with-vocab
View on GitHub
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆112Sep 26, 2024Updated last year
ZihaoHuang-notabot / Ultra-Sparse-Memory-Network
View on GitHub
☆48Jul 3, 2026Updated 3 weeks ago
sail-sg / ActivePRM
View on GitHub
☆21Apr 16, 2025Updated last year
haonan3 / V1
View on GitHub
V1: Toward Multimodal Reasoning by Designing Auxiliary Task
☆36Apr 14, 2025Updated last year
anpaure / cp_eval
View on GitHub
Tiny evaluation of leading LLMs on competitive programming problems
☆14Apr 10, 2026Updated 3 months ago
sail-sg / Video-Next-Event-Prediction
View on GitHub
☆28Aug 9, 2025Updated 11 months ago
sail-sg / D-TRAK
View on GitHub
Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)
☆39Jan 23, 2024Updated 2 years ago
haonan3 / AnchorContext
View on GitHub
AnchorAttention: Improved attention for LLMs long-context training
☆216Jan 15, 2025Updated last year
sail-sg / Attention-Sink
View on GitHub
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
☆164Jul 8, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
sail-sg / feedback-conditional-policy
View on GitHub
Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"
☆65Jan 5, 2026Updated 6 months ago
sail-sg / sailor2
View on GitHub
🔱 Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
☆73Mar 21, 2025Updated last year
sail-sg / dice
View on GitHub
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆47Apr 15, 2025Updated last year
sail-sg / LightTrans
View on GitHub
The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"
☆22Apr 22, 2025Updated last year
Lyun0912-wu / LongAttn
View on GitHub
LongAttn ：Selecting Long-context Training Data via Token-level Attention
☆15Jul 16, 2025Updated last year
epfml / schedules-and-scaling
View on GitHub
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆93Oct 30, 2024Updated last year
BryceZhuo / HybridNorm
View on GitHub
The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
☆19Mar 7, 2025Updated last year
AbanteAI / LoCoDiff-bench
View on GitHub
☆33Oct 15, 2025Updated 9 months ago
goombalab / Gather-and-Aggregate
View on GitHub
Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"
☆16Apr 30, 2025Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
tilde-research / nsa-release
View on GitHub
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆133Jun 24, 2025Updated last year
yuzhaouoe / pretraining-data-packing
View on GitHub
[ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training
☆24Aug 18, 2024Updated last year
sail-sg / AnytimeReasoner
View on GitHub
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆54Jul 15, 2025Updated last year
YisongMiao / DiSQ-Score
View on GitHub
The Dataset and Official Implementation for <Discursive Socratic Questioning: Evaluating the Faithfulness of Language Models’ Understandi…
☆18Aug 7, 2024Updated last year
sail-sg / DiffMemorize
View on GitHub
[TMLR 2025] On Memorization in Diffusion Models
☆33Oct 5, 2023Updated 2 years ago
sail-sg / regmix
View on GitHub
[ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)
☆194Feb 17, 2025Updated last year
model-architectures / GRAPE
View on GitHub
[ICLR 2026] GRAPE: Group Representational Position Encoding (https://arxiv.org/abs/2512.07805)
☆115Jun 15, 2026Updated last month
hkust-nlp / PreSelect
View on GitHub
[ICML 2025] Predictive Data Selection: The Data That Predicts Is the Data That Teaches
☆66Mar 4, 2025Updated last year
wdlctc / delta-attention-residuals-code
View on GitHub
Delta Attention Residuals - supplementary code and pretrained models
☆40May 20, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
RobertCsordas / moeut
View on GitHub
☆93Aug 18, 2024Updated last year
zhenyuhe00 / SWE-Swiss
View on GitHub
SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution
☆105Sep 24, 2025Updated 10 months ago
JinjieNi / dlms-are-super-data-learners
View on GitHub
The official github repo for "Diffusion Language Models are Super Data Learners".
☆227Nov 6, 2025Updated 8 months ago
Dao-AILab / grouped-latent-attention
View on GitHub
☆135May 29, 2025Updated last year
sail-sg / variational-reasoning
View on GitHub
Code for "Variational Reasoning for Language Models"
☆60Sep 29, 2025Updated 9 months ago
SDLAML / disco
View on GitHub
☆16Dec 11, 2025Updated 7 months ago
sail-sg / lm-random-memory-access
View on GitHub
☆15Mar 12, 2024Updated 2 years ago