liziniu / cold_start_rlView external linksLinks
Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?
☆19Mar 9, 2025Updated 11 months ago
Alternatives and similar repositories for cold_start_rl
Users that are interested in cold_start_rl are comparing it to the libraries listed below
Sorting:
- Collections of RLxLM experiments using minimal codes☆14Feb 17, 2025Updated 11 months ago
- Official repository to release the code and datasets in the paper, "Article Reranking by Memory-enhanced Key Sentence Matching for Detect…☆19Dec 15, 2021Updated 4 years ago
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆41Oct 31, 2025Updated 3 months ago
- ☆76Jan 8, 2026Updated last month
- ☆80Mar 11, 2025Updated 11 months ago
- Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)☆51May 12, 2025Updated 9 months ago
- ☆72Jun 10, 2025Updated 8 months ago
- Code for the experiments in the ACL 2020 paper "Estimating predictive uncertainty for rumour verification models"☆11May 15, 2020Updated 5 years ago
- This repository is a reimplementation of the paper(BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model: htt…☆11Nov 14, 2019Updated 6 years ago
- 用Paddle复现论文ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information(ACL2021)☆10Nov 15, 2021Updated 4 years ago
- [ICLR 2026] GRAPE: Group Representational Position Encoding (https://arxiv.org/abs/2512.07805)☆78Jan 27, 2026Updated 2 weeks ago
- ☆11Dec 15, 2025Updated last month
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 6 months ago
- ☆23Oct 31, 2025Updated 3 months ago
- Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment☆16Aug 6, 2024Updated last year
- Ilya Sutskever 推荐的30篇Deep learning 必读论文 (中英文对照翻译版)☆13Dec 18, 2024Updated last year
- Code for the paper "Closing the Curious Case of Neural Text Degeneration"☆11Apr 9, 2025Updated 10 months ago
- ☆15Jan 9, 2026Updated last month
- Generic build server☆64May 25, 2014Updated 11 years ago
- ☆17May 3, 2025Updated 9 months ago
- Math evaluations of llama models.☆10Jan 3, 2024Updated 2 years ago
- Engineering Blog article prototypes☆17Oct 12, 2025Updated 4 months ago
- React 0.13 with ES6, Immutable.js and Flux, Isomorphic as well☆11Mar 10, 2015Updated 10 years ago
- ☆26Nov 7, 2022Updated 3 years ago
- OpenCV implementation of the poisson image blend and Mean-Value-Coordinate image clone method☆10Nov 14, 2017Updated 8 years ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Jun 21, 2023Updated 2 years ago
- A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Langu…☆85Dec 12, 2025Updated 2 months ago
- QuoteSum is a textual QA dataset containing Semi-Extractive Multi-source Question Answering (SEMQA) examples written by humans, based on …☆13Mar 25, 2024Updated last year
- TG-CRITIC: A TIMBRE-GUIDED MODEL FOR REFERENCE-INDEPENDENT SINGING EVALUATION☆15May 26, 2023Updated 2 years ago
- An experiment to see if chatgpt can improve the output of the stanford alpaca dataset☆12Mar 29, 2023Updated 2 years ago
- ☆13May 21, 2024Updated last year
- ☆14Oct 9, 2022Updated 3 years ago
- Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"☆11Jan 10, 2025Updated last year
- ☆15Apr 15, 2024Updated last year
- ☆16May 21, 2025Updated 8 months ago
- Code for paper: Unraveling the Shift of Visual Information Flow in MLLMs: From Phased Interaction to Efficient Inference☆12Jun 7, 2025Updated 8 months ago
- Implementation of IntelliJ IDEA code completion plugin using a local LLM.☆17Feb 6, 2026Updated last week
- ☆10Feb 3, 2025Updated last year
- LCA-on-the-line (ICML 2024 Oral)☆13Feb 13, 2025Updated last year