liziniu/cold_start_rl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/liziniu/cold_start_rl)

liziniu / cold_start_rl

Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?

☆19

Alternatives and similar repositories for cold_start_rl

Users that are interested in cold_start_rl are comparing it to the libraries listed below

Sorting:

koalazf99 / nanoverl
View on GitHub
Collections of RLxLM experiments using minimal codes
☆14Feb 17, 2025Updated last year
ant-research / long-context-modeling
View on GitHub
Research work aimed at addressing the problem of modeling infinite-length context
☆46Dec 18, 2025Updated 2 months ago
QwenLM / PolyMath
View on GitHub
[NeurIPS 2025 D&B Track] Evaluation Code Repo for Paper "PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts"
☆41May 22, 2025Updated 9 months ago
Sphere-AI-Lab / FormalMATH-Bench
View on GitHub
☆76Jan 8, 2026Updated last month
ictnlp / Seq-NAT
View on GitHub
Source code for <Sequence-Level Training for Non-Autoregressive Neural Machine Translation>.
☆24Jan 17, 2022Updated 4 years ago
GAIR-NLP / AIME-Preview
View on GitHub
☆80Mar 11, 2025Updated 11 months ago
HawHello / zero-music-mern
View on GitHub
A full-stack online music app, developed using MERN stack (React, Express.js, MongoDB) and Electron. Libraries including Tailwind CSS, Re…
☆10Jul 2, 2024Updated last year
liziniu / GEM
View on GitHub
Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)
☆52May 12, 2025Updated 9 months ago
NuoJohnChen / JudgeLRM
View on GitHub
JudgeLRM: Large Reasoning Models as a Judge
☆41Jan 29, 2026Updated last month
opqrstuvcut / BertMouth
View on GitHub
This repository is a reimplementation of the paper(BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model: htt…
☆11Nov 14, 2019Updated 6 years ago
kochkinaelena / Uncertainty4VerificationModels
View on GitHub
Code for the experiments in the ACL 2020 paper "Estimating predictive uncertainty for rumour verification models"
☆11May 15, 2020Updated 5 years ago
model-architectures / GRAPE
View on GitHub
[ICLR 2026] GRAPE: Group Representational Position Encoding (https://arxiv.org/abs/2512.07805)
☆79Jan 27, 2026Updated last month
OpenCausaLab / MORE
View on GitHub
☆15Jan 9, 2026Updated last month
henrykmichalewski / math-evals
View on GitHub
Math evaluations of llama models.
☆10Jan 3, 2024Updated 2 years ago
ddollar / anvil
View on GitHub
Generic build server
☆64May 25, 2014Updated 11 years ago
DzvinkaYarish / ControlNet-different-backbones
View on GitHub
☆12Jun 15, 2023Updated 2 years ago
clue-ai / ChatYuan-7B
View on GitHub
ChatYuan-7B
☆13Jun 16, 2023Updated 2 years ago
balena-io-experimental / cellular-test
View on GitHub
A sample app to debug and validate cellular modems on balena devices
☆13Jun 5, 2019Updated 6 years ago
mattf1n / basis-aware-threshold
View on GitHub
Code for the paper "Closing the Curious Case of Neural Text Degeneration"
☆11Apr 9, 2025Updated 10 months ago
RLHFlow / GVM
View on GitHub
☆16Jul 29, 2025Updated 7 months ago
domdomegg / gdoc2latex
View on GitHub
🔀 📝 Convert Google Docs files to LaTeX
☆11Dec 10, 2025Updated 2 months ago
hlz0606 / DH-AUG-DH-Forward-Kinematics-Model-Driven-Augmentation-for-3D-Human-Pose-Estimation
View on GitHub
[ECCV 2022] DH-AUG: DH Forward Kinematics Model Driven Augmentation for 3D Human Pose Estimation
☆12Nov 21, 2022Updated 3 years ago
hekike / ES6-Immutable-React
View on GitHub
React 0.13 with ES6, Immutable.js and Flux, Isomorphic as well
☆11Mar 10, 2015Updated 10 years ago
carricky / Image_Blend
View on GitHub
OpenCV implementation of the poisson image blend and Mean-Value-Coordinate image clone method
☆10Nov 14, 2017Updated 8 years ago
jxnl / mit-lecture
View on GitHub
☆11Feb 25, 2025Updated last year
lavinal712 / control-lora-v3
View on GitHub
☆11Dec 15, 2025Updated 2 months ago
kaiokendev / cutoff-len-is-context-len
View on GitHub
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆63Jun 21, 2023Updated 2 years ago
rdi-berkeley / awesome-RLVR-boundary
View on GitHub
A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Langu…
☆86Dec 12, 2025Updated 2 months ago
YuejieGao / TG-CRITIC
View on GitHub
TG-CRITIC: A TIMBRE-GUIDED MODEL FOR REFERENCE-INDEPENDENT SINGING EVALUATION
☆15May 26, 2023Updated 2 years ago
Ternence / MCM2017
View on GitHub
MCM 2017
☆17Jan 28, 2017Updated 9 years ago
kzhai / Papers
View on GitHub
☆15Feb 22, 2018Updated 8 years ago
thu-coai / BARREL
View on GitHub
[ICLR 2026] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
☆17May 21, 2025Updated 9 months ago
kilian-group / LMLM
View on GitHub
☆24Oct 31, 2025Updated 4 months ago
Nicolas-BZRD / llm-distillation
View on GitHub
☆10Feb 3, 2025Updated last year
HydraQYH / expert_specialization_moe
View on GitHub
Expert Specialization MoE Solution based on CUTLASS
☆27Jan 19, 2026Updated last month
ritikamangla / QSalience
View on GitHub
https://arxiv.org/abs/2404.10917
☆14Mar 18, 2025Updated 11 months ago
ICTMCG / DNA-Det
View on GitHub
This is the official repository for the code and datasets in the paper "Deepfake Network Architecture Attribution", AAAI 2022.
☆55Jul 4, 2023Updated 2 years ago
liunian-harold-li / scotd
View on GitHub
☆15Apr 15, 2024Updated last year
armingh2000 / FactScoreLite
View on GitHub
FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package bu…
☆13Apr 25, 2024Updated last year