A simple implementation of ReasonGenRM.
☆19Apr 21, 2025Updated 10 months ago
Alternatives and similar repositories for ReasonGenRM
Users that are interested in ReasonGenRM are comparing it to the libraries listed below
Sorting:
- [ACL'24] WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations☆13Sep 11, 2024Updated last year
- Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?☆19Mar 9, 2025Updated 11 months ago
- [ICML 2025] Official code of "AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization"☆30Jan 10, 2026Updated last month
- GenRM-CoT: Data release for verification rationales☆68Oct 16, 2024Updated last year
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆37May 31, 2025Updated 9 months ago
- JudgeLRM: Large Reasoning Models as a Judge☆41Jan 29, 2026Updated last month
- ☆264May 14, 2025Updated 9 months ago
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- 用Paddle复现论文ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information(ACL2021)☆10Nov 15, 2021Updated 4 years ago
- PyTorch Implementation for the paper "Let Me Help You! Neuro-Symbolic Short-Context Action Anticipation" accepted to RA-L'24.☆12Nov 27, 2024Updated last year
- 在监控画质下实现对校园自行车的重识别,包含REID模型识别,向量数据库检索,UI展示☆10Feb 13, 2024Updated 2 years ago
- ☆10Oct 11, 2022Updated 3 years ago
- Ilya Sutskever 推荐的30篇Deep learning 必读论文 (中英文对照翻译版)☆13Dec 18, 2024Updated last year
- ☆24Oct 31, 2025Updated 4 months ago
- ☆12Jun 16, 2023Updated 2 years ago
- ☆14Jul 18, 2025Updated 7 months ago
- ☆11Dec 15, 2025Updated 2 months ago
- Master the techniques of function-calling and structured data extraction with LLMs. Learn to enhance LLM capabilities, integrate web serv…☆12Jun 29, 2024Updated last year
- About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning…☆13Jan 14, 2026Updated last month
- ☆26Nov 7, 2022Updated 3 years ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 7 months ago
- A Controllable Model of Grounded Response Generation (AAAI 21)☆13Oct 25, 2022Updated 3 years ago
- ThetaEvolve: Test-time Learning on Open Problems, enabling RL training on AlphaEvolve/OpenEvolve and emphasizing scaling test-time comput…☆132Updated this week
- simple demo to visualize the details of one of the basic foundations of deep learning: convolution☆12Feb 22, 2019Updated 7 years ago
- The official repository for the CodeGym project: "Generalizable End-to-End Tool-Use RL with Synthetic CodeGym"☆23Oct 14, 2025Updated 4 months ago
- ☆18May 3, 2025Updated 10 months ago
- Code for a multi-agent particle environment used in the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"☆11Jan 15, 2020Updated 6 years ago
- ☆11Jun 5, 2023Updated 2 years ago
- Fully open reproduction of DeepSeek-R1☆12Mar 24, 2025Updated 11 months ago
- Code for the paper "Closing the Curious Case of Neural Text Degeneration"☆11Apr 9, 2025Updated 10 months ago
- Math evaluations of llama models.☆10Jan 3, 2024Updated 2 years ago
- A project designed to build and render a full Minecraft crafting tree.☆10Aug 10, 2021Updated 4 years ago
- Unofficial implementation of Chain of Hindsight (https://arxiv.org/abs/2302.02676) using pytorch and huggingface Trainers.☆11Apr 5, 2023Updated 2 years ago
- How to really install tensorflow-gpu from source on a clean instance of Ubuntu