A research repo for experiments about Reinforcement Finetuning
☆54Apr 7, 2025Updated 11 months ago
Alternatives and similar repositories for reft-exp
Users that are interested in reft-exp are comparing it to the libraries listed below
Sorting:
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Feb 9, 2026Updated 3 weeks ago
- Reproduce R1 Zero on Logic Puzzle☆2,439Mar 20, 2025Updated 11 months ago
- ☆44Nov 17, 2024Updated last year
- Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"☆22Sep 21, 2025Updated 5 months ago
- ☆15Feb 21, 2024Updated 2 years ago
- All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks☆18Apr 24, 2024Updated last year
- ☆23Jan 17, 2025Updated last year
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆67May 5, 2025Updated 10 months ago
- [ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…☆28Sep 25, 2024Updated last year
- ☆22Sep 20, 2023Updated 2 years ago
- [COLM'25] Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?☆37Jun 5, 2025Updated 9 months ago
- [NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling☆34Nov 8, 2024Updated last year
- TAP: An automated jailbreaking method for black-box LLMs☆222Dec 10, 2024Updated last year
- ☆30Dec 24, 2019Updated 6 years ago
- On Memorization of Large Language Models in Logical Reasoning☆74Mar 29, 2025Updated 11 months ago
- Simple RL training for reasoning☆3,830Dec 23, 2025Updated 2 months ago
- Public teaching materials for Reasoning and Agents☆12May 29, 2025Updated 9 months ago
- CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics☆27Nov 1, 2025Updated 4 months ago
- Public code repo for COLING 2025 paper "Aligning LLMs with Individual Preferences via Interaction"☆41Apr 3, 2025Updated 11 months ago
- ☆33Mar 13, 2025Updated 11 months ago
- [EMNLP 2025 Main] ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆39Aug 20, 2025Updated 6 months ago
- Embodied-Planner-R1: Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning☆25Jan 5, 2026Updated 2 months ago
- Evolutionary Multi-objective Optimization based Neural Architecture Search for Cognitive Diagnosis☆12Sep 5, 2024Updated last year
- Applescripts for controlling Spotify☆23Oct 20, 2016Updated 9 years ago
- [NeurIPS 2025] Official Implementation of paper "Sherlock: Self-Correcting Reasoning in Vision-Language Models"☆28Sep 18, 2025Updated 5 months ago
- using pvanet framework train mobilenet-v2 for objects detection, papaer: https://arxiv.org/abs/1611.08588☆13Feb 13, 2019Updated 7 years ago
- ☆10Oct 2, 2024Updated last year
- Reading comprehension based question-answering model for news articles.☆11Jun 22, 2022Updated 3 years ago
- [CVPR 2026] Official repo for "VideoSSR: Video Self-Supervised Reinforcement Learning"☆33Nov 11, 2025Updated 3 months ago
- New York Times Scraper☆11Feb 19, 2024Updated 2 years ago
- Some of my practices on Algorithms : ) 这个仓库保存着我在 LeetCode、剑指Offer 上的一些解答,代码中保留了必要的注释。不一定是最优的解答,但力保代码简洁易懂。后续还会整合其他题库,如若发现什么错误,希望 你能告诉我或帮助我…☆11Dec 3, 2024Updated last year
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆52Jul 15, 2025Updated 7 months ago
- ☆44Oct 1, 2024Updated last year
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆37Jun 10, 2024Updated last year
- ☆39May 21, 2024Updated last year
- ☆39May 19, 2023Updated 2 years ago
- Cloak - A Hybrid Development Framework for HarmonyOS☆12May 6, 2025Updated 10 months ago
- ACL24☆11Jun 7, 2024Updated last year
- Some Pwn Challenges from winesap.☆14Aug 15, 2019Updated 6 years ago