CarlanLark / Lp-RegLinks
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
☆33Updated 3 months ago
Alternatives and similar repositories for Lp-Reg
Users that are interested in Lp-Reg are comparing it to the libraries listed below
Sorting:
- ☆84Updated 10 months ago
- [BIRD-INTERACT] Re-imagines Text-to-SQL evaluation via lens of dynamic interactions.☆455Updated 3 weeks ago
- (EMNLP 2025 Findings) Source Evaluation scripts for Humanity's Last Code Exam☆95Updated 4 months ago
- The code for Refining Sentence Embedding Model through Ranking Sentences Generation with Large Language Models (Finding of ACL2025)☆83Updated 6 months ago
- Official Implementation of FastMCTS: A Simple Sampling Strategy for Data Synthesis☆112Updated 6 months ago
- Code for "FaithLens: Detecting and Explaining Faithfulness Hallucination"☆96Updated last week
- ☆71Updated 3 months ago
- ☆50Updated 8 months ago
- MAX31855 full-featured driver library for general-purpose MCU and Linux.☆70Updated 2 months ago
- [ACL 2025 Oral] QAEncoder: Towards Aligned Representation Learning in Question Answering Systems☆176Updated 6 months ago
- 一个视频、Wifi融合的摔倒检测系统☆67Updated 4 months ago
- ☆99Updated 11 months ago
- ☆75Updated 7 months ago
- 一个基于 Qt6 和 C++20 构建的现代化、功能丰富的通信调试平台。该应用程序集成了串口通信、TCP网络通信、JavaScript脚本引擎、数据可视化等核心功能,为开发者提供了专业的数据处理和协议解析能力。支持多种数据格式、实时监控、专业级数据可视化和完全可定制的样式系…☆86Updated 3 months ago
- The code for paper "Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-Review" accepted by ACL 2025.☆103Updated 7 months ago
- A ChatGPT-based programming approach is proposed to assist in solving engineering computational problems. Using three-dimensional slope s…☆112Updated 4 months ago
- 【最新国际股票】代号:Stock-Finex-多语言股票-功能:新股申购、大宗交易、股票配资、质押理财、在线客服-多国语言,最新股票源码-股票搭建-java股票☆79Updated 5 months ago
- 提供一系列组件提供快速高效大家高质量中后台应用的能力☆18Updated 4 months ago
- An Interaction Fiction Demo Powered AI Dungeon☆84Updated 3 months ago
- Expanded netty to provide some tool classes, while supporting serial communication☆88Updated last week
- [VLDB 2025] SimRN: Trajectory Similarity Learning in Road Networks based on Distributed Deep Reinforcement Learning☆106Updated 8 months ago
- ☆79Updated 11 months ago
- Using multiple regression model for analyzing and predicting the stock price☆41Updated 11 months ago
- ☆48Updated 2 months ago
- A modern web application for the Melbourne University Ultimate Frisbee Club, built with Next.js 15, TypeScript, and Tailwind CSS. This pl…☆101Updated 5 months ago
- 借鉴一下大佬的思路,少部分原创☆71Updated 8 months ago
- ☆114Updated 3 weeks ago
- A comprehensive, production-ready framework for building intelligent AI agents with advanced capabilities including tool calling, persist…☆163Updated 4 months ago
- ☆121Updated 3 weeks ago
- The code to implement the distributional reinforcement learning algorithm.☆60Updated 4 months ago