HenryLau7 / CFPOLinks
☆20Updated 4 months ago
Alternatives and similar repositories for CFPO
Users that are interested in CFPO are comparing it to the libraries listed below
Sorting:
- Lottery Ticket Adaptation☆39Updated 6 months ago
- Exploration of automated dataset selection approaches at large scales.☆42Updated 3 months ago
- ☆32Updated 5 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆34Updated last year
- Aioli: A unified optimization framework for language model data mixing☆27Updated 4 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆15Updated 3 weeks ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆25Updated 2 months ago
- ☆45Updated 3 months ago
- ☆16Updated 10 months ago
- The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆31Updated last week
- Control LLM☆14Updated 2 months ago
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆26Updated 8 months ago
- Make reasoning models scalable☆35Updated last week
- ☆14Updated 3 weeks ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆37Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Updated last year
- A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models☆23Updated 6 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆28Updated 8 months ago
- Extensive Self-Contrast Enables Feedback-Free Language Model Alignment☆21Updated last year
- implementation of dualformer☆17Updated 3 months ago
- Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF☆18Updated 8 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆26Updated 7 months ago
- Plancraft is a minecraft environment and agent suite to test planning capabilities in LLMs☆15Updated 3 weeks ago
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆18Updated 3 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆47Updated last month
- Official Repository for Task-Circuit Quantization☆20Updated last week
- ☆24Updated 8 months ago
- ☆31Updated 7 months ago
- ☆79Updated 9 months ago