Zcy233035/rl-explainer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Zcy233035/rl-explainer)

Zcy233035 / rl-explainer

rl-explainer

☆195

Alternatives and similar repositories for rl-explainer

Users that are interested in rl-explainer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Leey21 / CipherBank
View on GitHub
☆13Jun 13, 2025Updated last year
AI45Lab / DataElf
View on GitHub
DataElf is an intelligent data workflow engine that turns natural-language tasks into secure, extensible, and executable data pipelines.
☆23Updated this week
tsynbio / AutoPE
View on GitHub
☆14Jan 8, 2025Updated last year
Niklauseik / FiLM-Benchmark
View on GitHub
Benchmark pipeline for evaluating language models on financial tasks, including sentiment analysis and credit scoring. Supports over ten …
☆11Sep 17, 2024Updated last year
Leey21 / A-Data-Centric-Study
View on GitHub
☆18Mar 2, 2026Updated 4 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Leey21 / Awesome-Long-CoT-Data
View on GitHub
Awesome Long-CoT Data
☆22Mar 26, 2025Updated last year
thinkwee / AwesomeOPD
View on GitHub
Awesome List for On-Policy Distillation
☆759Jun 23, 2026Updated 3 weeks ago
arrowonstr / LLM-Handwritten-Template
View on GitHub
包含了LLM的一些手撕代码，如强化学习。可以帮助从代码层面深入理解原理，以及有助于准备大模型面试可能出现的手撕。后续会更新Transformer等更多手撕
☆117Mar 15, 2026Updated 4 months ago
Trae1ounG / PaperPlotHub
View on GitHub
面向全球研究人员的开源论文绘图脚本市场 | Open, AI-reviewed marketplace of academic paper plotting scripts for global researchers
☆70Apr 26, 2026Updated 2 months ago
ChangyuChen347 / MaskedThought
View on GitHub
[ACL 2024] Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models
☆27Jul 9, 2024Updated 2 years ago
VickiCui / MORE
View on GitHub
Code release for "MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning"
☆11Oct 11, 2024Updated last year
thinkwee / AgentsMeetRL
View on GitHub
Awesome List for Agentic RL
☆1,701Jun 20, 2026Updated last month
kidist-amde / ddro
View on GitHub
We introduce the direct document relevance optimization (DDRO) for training a pairwise ranker model. DDRO encourages the model to focus o…
☆39Jul 2, 2026Updated 2 weeks ago
XueZeyue / Awesome-Visual-Generation-Alignment-Survey
View on GitHub
A survey for visual generation alignment
☆144Nov 9, 2025Updated 8 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
GAIR-NLP / lm-open-science-evaluation
View on GitHub
Reproducible and flexible LLM evaluations for scientific reasoning.
☆29Jul 23, 2025Updated 11 months ago
thunlp / duplex-model
View on GitHub
☆48Aug 17, 2024Updated last year
chenhao2345 / UCR
View on GitHub
Unsupervised Lifelong Person Re-identification via Contrastive Rehearsal
☆11Apr 7, 2022Updated 4 years ago
Mr-Loevan / DPO-Survey
View on GitHub
[TPAMI 2026] A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications
☆16Jun 9, 2026Updated last month
vimar-gu / ColorPromptReID
View on GitHub
Color Prompting for Data-Free Continual Unsupervised Domain Adaptive Person Re-Identification
☆10Aug 22, 2023Updated 2 years ago
lrhammond / almanac
View on GitHub
Implementation and evaluation of Almanac (Automaton/Logic Multi-Agent Natural Actor-Critic), an algorithm for multi-agent reinforcement l…
☆10May 5, 2022Updated 4 years ago
PeterGriffinJin / Search-R1
View on GitHub
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
☆5,123Nov 13, 2025Updated 8 months ago
Furyton / GR-as-MVDR
View on GitHub
[SIGIR'24] Generative Retrieval as Multi-Vector Dense Retrieval
☆36Oct 18, 2024Updated last year
ZHAOoops / AI-Notes
View on GitHub
Bilibili东川路第一可爱猫猫虫的AI笔记
☆274May 2, 2026Updated 2 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
langfengQ / verl-agent
View on GitHub
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…
☆2,138Jun 9, 2026Updated last month
XinshuangL / SELF-PARAM
View on GitHub
The official implementation of the paper "Self-Updatable Large Language Models by Integrating Context into Model Parameters"
☆15May 18, 2025Updated last year
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,551Updated this week
NISPLab / CleanSheet
View on GitHub
Code and full version of the paper "Hijacking Attacks against Neural Network by Analyzing Training Data"
☆14Feb 28, 2024Updated 2 years ago
DannyWANGD / PaperBrain
View on GitHub
An intelligent academic paper reading tool that keeps up with cutting-edge research and builds a local Obsidian knowledge base.
☆36Jun 6, 2026Updated last month
uob-TextAnalytics / text_labs_public
View on GitHub
Lab notebooks for Text Analytics
☆15Apr 21, 2026Updated 3 months ago
SingularGuyLeBorn / Awesome-LLM-From-Scratch-Ultimate-Tutorial
View on GitHub
☆16Nov 25, 2025Updated 7 months ago
October2001 / ProLong
View on GitHub
[ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Models
☆61Jul 23, 2024Updated last year
black-yt / ReaLS
View on GitHub
Exploring Representation-Aligned Latent Space for Better Generation
☆19Mar 17, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
google-deepmind / constrained_optidice
View on GitHub
☆10Sep 9, 2022Updated 3 years ago
huybery / GDPnet
View on GitHub
GDPnet: "Geometry-guided Dense Perspective Network for Speech-Driven Facial Animation." (TVCG 2021)
☆11Nov 21, 2021Updated 4 years ago
yujmo / arXiv-template
View on GitHub
☆17Jul 10, 2025Updated last year
scwangdyd / large_vocabulary_hoi_detection
View on GitHub
Code for ICCV2021: Discovering Human Interactions with Large-Vocabulary Objects via Query and Multi-Scale Detection
☆28Oct 12, 2021Updated 4 years ago
zepingyu0512 / in-context-mechanism
View on GitHub
code for EMNLP 2024 paper: How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for M…
☆13Nov 17, 2024Updated last year
xbq1994 / Feature-Recovery-Transformer
View on GitHub
Code of "Learning Feature Recovery Transformer for Occluded Person Re-identification" (TIP)
☆10Dec 28, 2022Updated 3 years ago
sqs-ustc / tool-reasoning-framework-PTE
View on GitHub
☆38Jan 1, 2026Updated 6 months ago