Mryangkaitong / deepseek-r1-gsm8k
☆37Updated 2 months ago
Alternatives and similar repositories for deepseek-r1-gsm8k:
Users that are interested in deepseek-r1-gsm8k are comparing it to the libraries listed below
- Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process☆26Updated 8 months ago
- ☆81Updated last year
- [ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)☆124Updated 2 months ago
- Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning☆162Updated last year
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆78Updated 5 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 4 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆42Updated 5 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆75Updated 3 months ago
- ☆98Updated 6 months ago
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆79Updated last year
- ☆62Updated 4 months ago
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆117Updated 5 months ago
- ☆139Updated last year
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆39Updated 8 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆61Updated 5 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆147Updated 7 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆153Updated 10 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆136Updated 2 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆130Updated 3 months ago
- [SIGIR'24] The official implementation code of MOELoRA.☆159Updated 8 months ago
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆169Updated 3 months ago
- A curated reading list for large language model (LLM) alignment. Take a look at our new survey "Large Language Model Alignment: A Survey"…☆78Updated last year
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆111Updated 6 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆76Updated last year
- Fantastic Data Engineering for Large Language Models☆87Updated 3 months ago
- A research repo for experiments about Reinforcement Finetuning☆43Updated last week
- SOTA RL fine-tuning solution for advanced math reasoning of LLM☆103Updated last week
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆166Updated 9 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆57Updated 6 months ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆115Updated last year