sanowl / Self-Correcting-LLM--Reinforcement-Learning-View external linksLinks
This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by google
☆38Jul 9, 2025Updated 7 months ago
Alternatives and similar repositories for Self-Correcting-LLM--Reinforcement-Learning-
Users that are interested in Self-Correcting-LLM--Reinforcement-Learning- are comparing it to the libraries listed below
Sorting:
- ☆17Mar 3, 2025Updated 11 months ago
- Accompanying code for "Boosted Prompt Ensembles for Large Language Models"☆30Apr 13, 2023Updated 2 years ago
- ☆32Oct 31, 2024Updated last year
- ☆35Jan 29, 2023Updated 3 years ago
- this is an implementation for the paper Improve Mathematical Reasoning in Language Models by Automated Process Supervision from google de…☆44Jul 8, 2025Updated 7 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆85Mar 7, 2025Updated 11 months ago
- Implementation for the paper "Unified Multimodal Model with Unlikelihood Training for Visual Dialog"☆13May 12, 2023Updated 2 years ago
- A Caffe/C++ implementation of Deep Deterministic Policy Gradient☆10Feb 1, 2019Updated 7 years ago
- ☆14Aug 12, 2024Updated last year
- Implementation of BIMRL: Brain Inspired Meta Reinforcement Learning - Roozbeh Razavi et al. (IROS 2022)☆10Dec 1, 2022Updated 3 years ago
- Set east asia font in pptx correctly.☆12Sep 25, 2024Updated last year
- Source code for journal paper "Multiagent Reinforcement Learning With Sparse Interactions by Negotiation and Knowledge Transfer"☆13Dec 26, 2017Updated 8 years ago
- Official PyTorch Implementation of Federated Learning with Positive and Unlabeled Data☆10Aug 12, 2022Updated 3 years ago
- ADAPTIVE RESONANCE THEORY. Gail A. Carpenter and Stephen Grossberg☆10Feb 10, 2015Updated 11 years ago
- A Random Matrix Approach to Extreme Learning Machine☆15Feb 23, 2018Updated 7 years ago
- [ACL 2023] Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation.☆10Dec 19, 2024Updated last year
- Low-rank Tensor Based Proximity Learning for Multi-view Clustering, TKDE2022☆11Dec 31, 2021Updated 4 years ago
- Code for ACL 2024 findings paper "wav2vec-S: Adapting Pre-trained Speech Models for Streaming"☆10Apr 20, 2025Updated 9 months ago
- Code and data for the ACM CIKM 2022 paper "Rank List Sensitivity of Recommender Systems to Interaction Perturbations"☆10Aug 16, 2022Updated 3 years ago
- The package is developed for treatment recommendation & pairwise treatment individual effect estimation (ITE/CATE/HTE) when multiple trea…☆11Mar 9, 2023Updated 2 years ago
- This repository is contains several Automated feature selection methods in CTR Predicition.☆10Dec 18, 2022Updated 3 years ago
- ☆10Jul 20, 2020Updated 5 years ago
- ☆16Jul 29, 2025Updated 6 months ago
- ☆11Oct 29, 2022Updated 3 years ago
- ardrone simulation in gazebo(for kinetic and gazebo 7). Now it can run.☆10Oct 27, 2017Updated 8 years ago
- ☆10Sep 21, 2020Updated 5 years ago
- TransMix: Transformer-based Value Function Decomposition for Cooperative Multi-agent Reinforcement Learning☆11Oct 18, 2022Updated 3 years ago
- Code for "Using Embeddings to Correct for Unobserved Confounding"☆10May 31, 2019Updated 6 years ago
- The source code of the paper "Compressed Federated Learning Based on Adaptive Local Differential Privacy".☆10Oct 23, 2023Updated 2 years ago
- Multi-objective reinforcement learning for covid-19 control☆12Aug 12, 2021Updated 4 years ago
- Low-level autonomous control and tracking of quadrotor using reinforcement learning - Proximal Policy Optimization☆11Dec 2, 2020Updated 5 years ago
- Code for ICML 2022 paper: Achieving Fairness at No Utility Cost via Data Reweighing with Influence☆11Aug 3, 2022Updated 3 years ago
- ☆14Mar 5, 2024Updated last year
- ☆17Apr 2, 2025Updated 10 months ago
- Codebase accompanying the paper 'Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts', (Emelin, D…☆11Feb 14, 2023Updated 2 years ago
- Task models for human robot collaboration☆12Jul 17, 2018Updated 7 years ago
- Code for the paper "Optimal Off-Policy Evaluation from Multiple Logging Policies"☆15Jul 17, 2021Updated 4 years ago
- ☆26Oct 16, 2025Updated 3 months ago
- Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting☆12Mar 24, 2023Updated 2 years ago