realize the reinforcement learning training for gpt2 llama bloom and so on llm model
☆27Sep 19, 2023Updated 2 years ago
Alternatives and similar repositories for llm_rlhf
Users that are interested in llm_rlhf are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- chatglm_rlhf_finetuning☆30Oct 10, 2023Updated 2 years ago
- Large language Model fintuning bloom , opt , gpt, gpt2 ,llama,llama-2,cpmant and so on☆100Apr 24, 2024Updated 2 years ago
- share data, prompt data , pretraining data☆36Nov 30, 2023Updated 2 years ago
- Code and data for the VLDB 2023 paper: RECA: Related Tables Enhanced Column Semantic Type Annotation Framework☆12May 7, 2025Updated last year
- The source code of the Sudowoodo paper in ICDE 2023☆19May 24, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Reinforcement learning (RL) is an effective method to find reasoning pathways in incomplete knowledge graphs (KGs). To overcome the chall…☆26Oct 13, 2024Updated last year
- ☆16May 31, 2024Updated last year
- Training a reward model for RLHF using RWKV.☆15Jun 5, 2023Updated 2 years ago
- This repo contains code for paper: "Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach".☆25Oct 21, 2024Updated last year
- AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension (ACL 2022)☆27May 20, 2022Updated 4 years ago
- Prompt Fine-tuning on GLM, BART and Flan-T5.☆21Jan 20, 2023Updated 3 years ago
- ☆13Apr 10, 2025Updated last year
- ☆32Apr 15, 2023Updated 3 years ago
- aigc_serving lightweight and efficient Language service model reasoning☆24Jun 12, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- LangChain Agent☆11Nov 25, 2025Updated 6 months ago
- [TPAMI] "Symbolic Visual Reinforcement Learning: A Scalable Framework with Object-Level Abstraction and Differentiable Expression Search"…☆18Jan 4, 2023Updated 3 years ago
- ☆22May 28, 2025Updated 11 months ago
- Natural Language Processing Toolkit for Neuroscience☆27Dec 4, 2024Updated last year
- HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking☆13Apr 11, 2025Updated last year
- Source code for the Paper "Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models"☆19Feb 1, 2026Updated 3 months ago
- 对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF☆197May 23, 2023Updated 3 years ago
- ☆18Dec 8, 2024Updated last year
- Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"☆59Jun 27, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆12Mar 27, 2025Updated last year
- Unofficial baselines for ManiSkill, including RL and BC algorithms.☆21Jun 6, 2024Updated last year
- [ICLR 2024] Adaptive Replay Ratio implementation from 'Revisiting Plasticity in Visual RL: Data, Modules and Training Stages'.☆13Oct 9, 2024Updated last year
- ☆16Apr 28, 2023Updated 3 years ago
- chinese few-shot ner☆16Aug 28, 2022Updated 3 years ago
- fast trainer for educational purposes☆26May 4, 2026Updated 3 weeks ago
- aigc evals☆10Dec 2, 2023Updated 2 years ago
- [ACL 2023] Official resources of "HAHE: Hierarchical Attention for Hyper-Relational Knowledge Graphs in Global and Local Level".☆28Aug 18, 2025Updated 9 months ago
- Code for the paper "Abstractive Summarization Guided by Latent Hierarchical Document Structure"☆13May 20, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Code for paper 'Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse'☆14Aug 2, 2024Updated last year
- [ICML 2023] "Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?" by Ruisi Cai, Zhenyu Zhang, Zhangyang Wang☆16May 4, 2023Updated 3 years ago
- Your finetuned model's back to its original safety standards faster than you can say "SafetyLock"!☆11Oct 16, 2024Updated last year
- A very simple chat application using Spring Boot, Vue.js (in TypeScript), gRPC, gRPC-Web and EnvoyProxy.☆10May 20, 2019Updated 7 years ago
- c++高性能内存池☆11May 10, 2021Updated 5 years ago
- LLM Compression Benchmark☆22Apr 8, 2026Updated last month
- Prioritized Generative Replay (ICLR 2025 Oral)☆29Mar 1, 2025Updated last year