Tim-Siu/reft-exp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Tim-Siu/reft-exp)

Tim-Siu / reft-exp

A research repo for experiments about Reinforcement Finetuning

☆55

Alternatives and similar repositories for reft-exp

Users that are interested in reft-exp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Unakar / Logic-RL
View on GitHub
Reproduce R1 Zero on Logic Puzzle
☆2,450Mar 20, 2025Updated last year
Leey21 / A-Data-Centric-Study
View on GitHub
☆18Mar 2, 2026Updated 4 months ago
kztakemoto / simbaja
View on GitHub
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
☆17Apr 24, 2024Updated 2 years ago
Gen-Verse / GenEnv
View on GitHub
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators
☆62Dec 23, 2025Updated 7 months ago
hkust-nlp / model-task-align-rl
View on GitHub
[ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".
☆18Feb 9, 2026Updated 5 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
NiuTrans / GRAM
View on GitHub
Code for ICML 2025 paper "GRAM: A Generative Foundation Reward Model for Reward Generalization"
☆21Sep 4, 2025Updated 10 months ago
shidilrzf / Anti-exploration-RL
View on GitHub
Anti exploration in offline reinforcement learning
☆11May 17, 2021Updated 5 years ago
googleinterns / localizing-paragraph-memorization
View on GitHub
☆15Feb 21, 2024Updated 2 years ago
nishadsinghi / sc-genrm-scaling
View on GitHub
[COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoni…
☆15Oct 31, 2025Updated 8 months ago
hkust-nlp / simpleRL-reason
View on GitHub
Simple RL training for reasoning
☆3,871Dec 23, 2025Updated 7 months ago
LiuAmber / RAHF
View on GitHub
[ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…
☆28Sep 25, 2024Updated last year
hkust-nlp / Laser
View on GitHub
[ICLR2026] Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
☆66May 22, 2025Updated last year
vfleaking / PTST
View on GitHub
Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"
☆22Sep 21, 2025Updated 10 months ago
luyaojie / chinese-nlp-conference-resource
View on GitHub
☆30Dec 24, 2019Updated 6 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
JiahangOK / MEMC_course
View on GitHub
THU Methematics for Engineering Master Candidates.(清华大学工程硕士数学课程)
☆11Nov 21, 2021Updated 4 years ago
AlphaPav / mem-kk-logic
View on GitHub
On Memorization of Large Language Models in Logical Reasoning
☆79Mar 29, 2025Updated last year
miniHuiHui / SimpleRL-reason-GRPO
View on GitHub
☆12Feb 27, 2025Updated last year
tianyi-lab / MiP-Overthinking
View on GitHub
[COLM'25] Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
☆39Jun 5, 2025Updated last year
Dogacel / Attention-Drift
View on GitHub
Code for the paper *Attention Drift: What Speculative Decoding Models Learn*.
☆28May 12, 2026Updated 2 months ago
SolidShen / RIPPLE_official
View on GitHub
☆20Feb 11, 2024Updated 2 years ago
google-deepmind / exedec
View on GitHub
☆14May 9, 2024Updated 2 years ago
zhangir-azerbayev / MetaMath
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
cychomatica / FreeDave
View on GitHub
Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models
☆23May 19, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
tianjunz / MADE
View on GitHub
☆19Jul 18, 2021Updated 5 years ago
bubble65 / DLLM-Searcher
View on GitHub
DLLM-Searcher has been accepted by SIGIR 2026! 🥳
☆33Jan 23, 2026Updated 6 months ago
piesauce / awesome-dLLM-resources
View on GitHub
Frequently updated list of dLLM (Diffusion Large Language Models) papers, models, and other resources
☆53Jul 19, 2026Updated last week
git-disl / Safety-Tax
View on GitHub
This is the official code for the paper "Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable".
☆35Mar 11, 2025Updated last year
vmicheli / lm-butlers
View on GitHub
☆12Aug 30, 2021Updated 4 years ago
MobileLLM / ParaThinker
View on GitHub
☆48Nov 1, 2025Updated 8 months ago
JacksonWuxs / Interpret_Instruction_Tuning_LLMs
View on GitHub
Understanding Why and How Instruction Tuning Changes Pre-trained Models
☆25Mar 18, 2024Updated 2 years ago
zkcpku / HiT-hierarchy-transformer
View on GitHub
code for "Implant Global and Local Hierarchy Information to Sequence based Code Representation Models"
☆12Dec 13, 2024Updated last year
princeton-nlp / benign-data-breaks-safety
View on GitHub
☆47Oct 1, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
DT6A / ReBRAC
View on GitHub
Author's implementation of ReBRAC, a minimalist improvement upon TD3+BC
☆19Oct 22, 2023Updated 2 years ago
microsoft / lightATAC
View on GitHub
A lightweight reimplementation of Adversarially Trained Actor Critic
☆19Mar 19, 2026Updated 4 months ago
Bollegala / DARep
View on GitHub
Cross-domain word representation learning
☆10May 23, 2015Updated 11 years ago
yihuaihong / ConceptVectors
View on GitHub
[EMNLP 2025 Main] ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"
☆40Aug 20, 2025Updated 11 months ago
SuperBruceJia / Awesome-LLM-Self-Consistency
View on GitHub
Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models
☆129Jul 20, 2025Updated last year
RUCAIBox / SimpleDeepSearcher
View on GitHub
SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis
☆120Jun 3, 2025Updated last year
OpenNLPLab / HGRN2
View on GitHub
HGRN2: Gated Linear RNNs with State Expansion
☆58Aug 20, 2024Updated last year