Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?
☆19Mar 9, 2025Updated 11 months ago
Alternatives and similar repositories for cold_start_rl
Users that are interested in cold_start_rl are comparing it to the libraries listed below
Sorting:
- Collections of RLxLM experiments using minimal codes☆14Feb 17, 2025Updated last year
- Research work aimed at addressing the problem of modeling infinite-length context☆46Dec 18, 2025Updated 2 months ago
- [NeurIPS 2025 D&B Track] Evaluation Code Repo for Paper "PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts"☆41May 22, 2025Updated 9 months ago
- ☆76Jan 8, 2026Updated last month
- Source code for <Sequence-Level Training for Non-Autoregressive Neural Machine Translation>.☆24Jan 17, 2022Updated 4 years ago
- ☆80Mar 11, 2025Updated 11 months ago
- A full-stack online music app, developed using MERN stack (React, Express.js, MongoDB) and Electron. Libraries including Tailwind CSS, Re…☆10Jul 2, 2024Updated last year
- Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)☆52May 12, 2025Updated 9 months ago
- JudgeLRM: Large Reasoning Models as a Judge☆41Jan 29, 2026Updated last month
- This repository is a reimplementation of the paper(BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model: htt…☆11Nov 14, 2019Updated 6 years ago
- Code for the experiments in the ACL 2020 paper "Estimating predictive uncertainty for rumour verification models"☆11May 15, 2020Updated 5 years ago
- [ICLR 2026] GRAPE: Group Representational Position Encoding (https://arxiv.org/abs/2512.07805)☆79Jan 27, 2026Updated last month
- ☆15Jan 9, 2026Updated last month
- Math evaluations of llama models.☆10Jan 3, 2024Updated 2 years ago
- Generic build server☆64May 25, 2014Updated 11 years ago
- ☆12Jun 15, 2023Updated 2 years ago
- ChatYuan-7B☆13Jun 16, 2023Updated 2 years ago
- A sample app to debug and validate cellular modems on balena devices☆13Jun 5, 2019Updated 6 years ago
- Code for the paper "Closing the Curious Case of Neural Text Degeneration"☆11Apr 9, 2025Updated 10 months ago
- ☆16Jul 29, 2025Updated 7 months ago
- 🔀 📝 Convert Google Docs files to LaTeX☆11Dec 10, 2025Updated 2 months ago
- [ECCV 2022] DH-AUG: DH Forward Kinematics Model Driven Augmentation for 3D Human Pose Estimation☆12Nov 21, 2022Updated 3 years ago
- React 0.13 with ES6, Immutable.js and Flux, Isomorphic as well☆11Mar 10, 2015Updated 10 years ago
- OpenCV implementation of the poisson image blend and Mean-Value-Coordinate image clone method☆10Nov 14, 2017Updated 8 years ago
- ☆11Feb 25, 2025Updated last year
- ☆11Dec 15, 2025Updated 2 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Jun 21, 2023Updated 2 years ago
- A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Langu…☆86Dec 12, 2025Updated 2 months ago
- TG-CRITIC: A TIMBRE-GUIDED MODEL FOR REFERENCE-INDEPENDENT SINGING EVALUATION☆15May 26, 2023Updated 2 years ago
- MCM 2017☆17Jan 28, 2017Updated 9 years ago
- ☆15Feb 22, 2018Updated 8 years ago
- [ICLR 2026] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs☆17May 21, 2025Updated 9 months ago
- ☆24Oct 31, 2025Updated 4 months ago
- ☆10Feb 3, 2025Updated last year
- Expert Specialization MoE Solution based on CUTLASS☆27Jan 19, 2026Updated last month
- https://arxiv.org/abs/2404.10917☆14Mar 18, 2025Updated 11 months ago
- This is the official repository for the code and datasets in the paper "Deepfake Network Architecture Attribution", AAAI 2022.☆55Jul 4, 2023Updated 2 years ago
- ☆15Apr 15, 2024Updated last year
- FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package bu…☆13Apr 25, 2024Updated last year