tengwang0318 / hierarchial_reward_modelView external linksLinks
Offical Code For "Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models"
☆18Mar 25, 2025Updated 10 months ago
Alternatives and similar repositories for hierarchial_reward_model
Users that are interested in hierarchial_reward_model are comparing it to the libraries listed below
Sorting:
- RxImg is an image processing tool based on reactive data streams, allowing image processing pipeline to be built on a low-code graphical …☆14Jan 22, 2023Updated 3 years ago
- Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models☆45Sep 19, 2025Updated 4 months ago
- Call native liquid glass elements from within your Ionic/Capacitor code.☆21Aug 21, 2025Updated 5 months ago
- Source code for the paper "Memory-Efficient Fine-Tuning via Low-Rank Activation Compression"☆13Aug 1, 2025Updated 6 months ago
- CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics☆27Nov 1, 2025Updated 3 months ago
- ProxyExplainer for Graph Neural Networks☆15Oct 24, 2024Updated last year
- Training and testing code from our CVPR 2023 paper "Are Deep Neural Networks SMARTer than Second Graders?"☆11Aug 10, 2023Updated 2 years ago
- Policy Optimization is awesome, let’s put a tree on it! 🌲🌟☆22Jul 4, 2025Updated 7 months ago
- Evaluation Pipeline for medical tasks.☆12Updated this week
- AC No Code 是偷懒者最好的在OJ中写代码AC的方式: Write nothing; submit nowhere.☆10May 18, 2020Updated 5 years ago
- MetaTrader 5 indicator that measures the largest distance between a price (high or low) and a moving average.☆11Oct 9, 2020Updated 5 years ago
- Cloak - A Hybrid Development Framework for HarmonyOS☆12May 6, 2025Updated 9 months ago
- [NeurIPS 2024] Official code for the paper 'RankUp: Boosting Semi-Supervised Regression with an Auxiliary Ranking Classifier'☆14Aug 22, 2025Updated 5 months ago
- PhishDecloaker: Detecting CAPTCHA-cloaked Phishing Websites via Hybrid Vision-based Interactive Models☆14Jan 3, 2025Updated last year
- neural_topic_models☆11Jan 9, 2017Updated 9 years ago
- 云任务调度仿真平台☆12Mar 11, 2020Updated 5 years ago
- Instantly fix problems with ChatGPT AI. Use ChatGPT and GPT-4 AI tools to find one-click 'lightbulb menu' solutions to problems in your c…☆12Mar 26, 2023Updated 2 years ago
- Code for the paper "FinRLlama: A Solution to LLM-Engineered Signals Challenge at FinRL Contest 2024"☆13Feb 14, 2025Updated last year
- Python package to download and use the SSB datasets☆11Aug 3, 2023Updated 2 years ago
- Integration test of Verilog AXI modules (https://github.com/alexforencich/verilog-axi) with LiteX.☆17Dec 19, 2022Updated 3 years ago
- KAF : Kolmogorov-Arnold Fourier Networks☆20Feb 19, 2025Updated 11 months ago
- ShanghaiTech SI140A Probability & Statistics for EECS, Spring 2023, Spring 2024.☆24Updated this week
- Synthetic Data Generation with Execution-Based Verification and Grounding for LLM Training.☆19Feb 7, 2025Updated last year
- UFT: Unifying Supervised and Reinforcement Fine-Tuning☆24Jun 30, 2025Updated 7 months ago
- ☆16Feb 23, 2025Updated 11 months ago
- How to hack Snap! Build Your Own Blocks☆10Apr 7, 2015Updated 10 years ago
- The public reproducible analysis code used for the gaze project☆11Dec 26, 2025Updated last month
- 🌈 Add a flowing, smart ribbon to the background.☆13Jan 4, 2023Updated 3 years ago
- Official code for the paper: DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models☆21Jan 6, 2026Updated last month
- Source code of the U-TRR methodology presented in "Uncovering In-DRAM RowHammer Protection Mechanisms: A New Methodology, Custom RowHamme…☆17Nov 15, 2022Updated 3 years ago
- Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?☆15Jun 3, 2025Updated 8 months ago
- Command helper for slurm system. Act as if you are on compute node.☆15Feb 1, 2025Updated last year
- [ECCV 2024] CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs☆18Jul 2, 2024Updated last year
- 🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)☆10Updated this week
- [ICLR 2024]: Is Self-Repair a Silver Bullet for Code Generation?☆15May 2, 2024Updated last year
- Remake developed in C++ from scratch using the SFML library.☆13Jan 7, 2021Updated 5 years ago
- Chain of Images for Intuitively Reasoning☆10Nov 29, 2023Updated 2 years ago
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆14Feb 4, 2025Updated last year
- Official pytorch implementation of the paper: "HomeGAN: Two stage GAN for enhanced floor plan image generation"☆11Aug 9, 2023Updated 2 years ago