Chengsong-Huang / Self-Calibration
codes for Efficient Test-Time Scaling via Self-Calibration
☆14Updated last month
Alternatives and similar repositories for Self-Calibration:
Users that are interested in Self-Calibration are comparing it to the libraries listed below
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆20Updated 2 months ago
- ☆22Updated 2 weeks ago
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆23Updated 2 weeks ago
- ☆44Updated 6 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 4 months ago
- ☆94Updated last month
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆36Updated last week
- ☆17Updated 2 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆41Updated 6 months ago
- ☆55Updated 6 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 5 months ago
- ☆22Updated 9 months ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆133Updated last month
- A Survey on the Honesty of Large Language Models☆57Updated 4 months ago
- This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language …☆25Updated last week
- Source code of paper: A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models.☆18Updated 3 weeks ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆60Updated 4 months ago
- [preprint] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆43Updated 4 months ago
- Pytorch implementation of Tree Preference Optimization (TPO) (Accepyed by ICLR'25)☆13Updated this week
- Model merging is a highly efficient approach for long-to-short reasoning.☆42Updated last month
- The demo, code and data of FollowRAG☆72Updated this week
- ☆54Updated 2 weeks ago
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆57Updated 4 months ago
- ☆14Updated 4 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆76Updated 3 months ago
- ☆95Updated last month
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆63Updated last month
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆25Updated 4 months ago
- [ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Models☆46Updated last month
- ☆17Updated last week